redcap-tools / PyCap

REDCap in Python
http://redcap-tools.github.io/PyCap/
MIT License
169 stars 80 forks source link

Unexpected Behavior: Passing df_kwargs to export_records() Overrides Untouched Default Values #268

Closed ugGit closed 1 year ago

ugGit commented 1 year ago

Hi

When passing in the df_kwargs to export_records() all default values are overwritten in PyCap 2.4. I believe, it would be more intuitive to only updated the changed values, and keep the other defaults as if df_kwargs is not passed in.

Example code

Project.export_records(format_type='df', df_kwargs={'low_memory': False})

Describe the behavior: Returns a dataframe with no index columns.

Expected behavior: A clear and concise description of what you expected to happen. Return a dataframe with the same index columns as the following method call:

Project.export_records(format_type='df')

Additional context: As far as I see, the if statement here in base.py would need to be modified:

# existing code
if not df_kwargs:
    if record_type == "eav": # semantic lost of particular record type in suggested change
        df_kwargs = {}
    elif content == "exportFieldNames":
        df_kwargs = {"index_col": "original_field_name"}
    elif content == "metadata":
        df_kwargs = {"index_col": "field_name"}
    elif content in ["report", "record"]:
        if self.is_longitudinal:
            df_kwargs = {"index_col": [self.def_field, "redcap_event_name"]}
        else:
            df_kwargs = {"index_col": self.def_field}
    # catchall for other endpoints
    else:
        df_kwargs = {}

As a change, I'd suggest something along the lines of:

# untested suggestion to solve the issue
if not df_kwargs:
    df_kwargs = {}

if 'index_col' not in df_kwargs.keys():
    if content == "exportFieldNames":
        df_kwargs["index_col"] = "original_field_name"
    elif content == "metadata":
        df_kwargs["index_col"] = "field_name"
    elif content in ["report", "record"]:
        if self.is_longitudinal:
            df_kwargs["index_col"] = [self.def_field, "redcap_event_name"]
        else:
            df_kwargs["index_col"] = self.def_field

If that sounds like a reasonable thing, I gladly make the changes and submit a PR.

pwildenhain commented 1 year ago

I like this suggestion, and I've definitely felt similarly frustrated before by having to restate the default arguments when needing df_kwargs.

Go for it ✅ and let's also think of some test cases for: