Closed camilovelez closed 1 year ago
Got it, thank you for reporting this 👍🏻
Can you confirm that if you install version 1.1.3 that the process does in fact work?
$ pip install PyCap==1.1.3
And then re-run your code (though you will need to change format_type
with format
for the old version) and see if it works
The reason to do this is to ensure that it's actually PyCap
version issue and not a REDCap version issue
If it works with the old package version then the next step will be to try and create a reproducible example which demonstrates the unexpected behavior.
Lastly, you said this is only a problem for the df
export right? So the json
export works with the current package version?
Hi @pwildenhain, thank you so much. I can confirm that when using PyCap==1.1.3, this works correctly and the issue does not appear.
Also, just to clarify the issue a bit more, export_records does work correctly in pycap 2.X when setting format_type='json', the issue only arises with format_type='df'. Thanks yet again
Ok thanks for double checking
Hi, @pwildenhain I am working with @camilovelez on this project. We are working with multiple forms from a RedCap Database from a longitudinal study. We are making the tests of the export_records
function with a particular form from a Database that has as the primary key the field study_id
. We are making our tests on a VM with PyCap 1.1.3 and a docker container with PyCap 2.2.0 to compare the output.
When we export the JSON file of the particular form using PyCap 1.1.3, we get a JSON object containing the keys study_id
, which corresponds to the "record" of the event, and the key redcap_event_name
, corresponding to the RedCap event names. When we extract the same form as a JSON file using the 2.2.0 version, we get a JSON object that does not contain those two keys. Those keys are not inherently part of the form, but we believe they were added to the longitudinal studies by PyCap to make some transformations, such as getting a DataFrame.
The error message we get when exporting the data as a DataFrame using PyCap 2.2.0 is shown below. The issue is that when exporting the data as CSV and then transforming it to DataFrame, it is not finding the study_id
index, which, as I said before, corresponds to the database's primary key. Unfortunately, we can not share the output JSON file with you since all the information is sensitive and confidential. However, please let us know if we can provide any other detail that eases the error tracing and fixing. Thanks!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.11/site-packages/redcap/methods/records.py", line 264, in export_records
return self._return_data(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redcap/methods/base.py", line 411, in _return_data
dataframe = self._read_csv(buf, **df_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redcap/methods/base.py", line 139, in _read_csv
dataframe = pd.read_csv(buf, **df_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 611, in _read
return parser.read(nrows)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 321, in read
index, column_names = self._make_index(date_data, alldata, names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py", line 379, in _make_index
simple_index = self._get_simple_index(alldata, columns)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py", line 411, in _get_simple_index
i = ix(idx)
^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py", line 406, in ix
raise ValueError(f"Index {col} invalid")
ValueError: Index study_id invalid
Great, this is all really helpful information. Can you also:
my_project = Project(url, token)
my_project.def_field
my_project.is_longitudinal
I would expect to see study_id
and True
My current suspicion is that you're facing something similar that another user reported in #193. In a nutshell there was a fairly recent change in the REDCap API where when you limit to records export to certain forms, it doesn't include the "primary key" fields such as study_id
and redcap_event_name
(since as you pointed out, they don't live on that form).
One sure-fire way to confirm this is the case is to run the API request in the API playground (while still limiting the forms), and see if study_id
and/or redcap_event_name
are included or not. If not, then PyCap
wouldn't know to include them. Version 1.1.3 isn't doing anything special is this regard either.
If this turns out to be an API issue, then I think your best recourse is one of the following:
Override the df_kwargs
if you don't need the study_id
or redcap_event_name
(since the default behavior expects these fields)
my_records = my_project.export_records(format_type="df", forms=["my_form"], df_kwargs={"index_col": None})
If you do need these columns, then I suggest using the export_metadata
method + the fields
parameter of export_records
method to automatically generate a list of fields that you want returned
(Code not tested, but you get the idea)
form_fields = my_project.export_metadata(forms=["my_form"])["field_name"].values.tolist()
export_fields = ["study_id", "redcap_event_name"] + form_fields
my_records = my_project.export_records(format_type="df", fields=export_fields)
Something that I could do to improve the user experience is warn if certain fields aren't found that we would expect to be there, and then only add something like redcap_event_name
to the index if the column exists in the export
Hi, @pwildenhain. Thanks for your answer.
study_id
and True
.We have run the API request in the playground, and, as you say, it does not include the fields study_id
and redcap_event_name
.
Maybe it could be good to include in the documentation of the export_records
function the information that you are giving us here. Even more considering that the DataFrame format export expects those fields by default.
We will try the solutions you suggest. Thank you!
Something that is still odd is that we are not getting that same error when using PyCap 1.1.3 with the same URL and token. So maybe there was some difference in how the DataFrames were generated with that version. And when we export the JSON files with both versions, we get the two fields with PyCap 1.1.3 and not with 2.2.0.
Ah ha! I found it! You were right, Version 1.1.3 did explicitly backfill the "primary key" fields. Part of my confusion comes from becoming the package maintainer long after this original code was written.
I remember deleting this code when I upgraded to 2.0.0 because I couldn't understand why we would need it 😅 and I guess now I know why.
Ok, I'll add this functionality and cut a new release. Thanks for reporting and for your thoroughness. This is one of the rare instances where I agree that we should "fix" the default API behavior.
Thanks, @pwildenhain! Glad we could help and it won't be an issue anymore :)
Thank you so much, @pwildenhain!
I am still encountering a similar issue with the 2.4.0 PyCap version and the redcap_event_name
field. Has this been fixed or should I used one of the workarounds?
I am still encountering a similar issue with the 2.4.0 PyCap version and the
redcap_event_name
field. Has this been fixed or should I used one of the workarounds?
Can you open a new issue with the details? Happy to take a look
Hi, we are having an issue with the export_records function when using the parameter format_type="df" in either pycap 2.1 or 2.2. It cannot export the data as a DataFrame because there are missing fields. We explored the problem, and we believe it is caused by missing fields in the JSON file that is being exported with newer PyCap versions, compared with the JSON file exported with the version we previously used (version 1.1.3). This has been an issue for the backward compatibility of our development based on PyCap.