redcap-tools / PyCap

REDCap in Python
http://redcap-tools.github.io/PyCap/
MIT License
169 stars 80 forks source link

Unexpected Behavior: export_records broken for format='df' [ValueError: Index record_id invalid] #193

Closed camachop-dbhi closed 2 years ago

camachop-dbhi commented 2 years ago

Description of behavior: When attempting to use the export_records method with the format option set to 'df' I am getting an error stating "ValueError: Index record_id invalid". See the code block below for the code that I am getting this error from, and the associated output.

>>> study_data=redcap_project.export_records(format='df',forms=['study_data'])
Traceback (most recent call last):
  File "./CEUS-RedCap-Update-ETL.py", line 102, in <module>
    study_data=redcap_project.export_records(format='df',forms=['study_data'])
  File "/home/PyCap/redcap/methods/records.py", line 253, in export_records
    dataframe = self._read_csv(buf, **df_kwargs)
  File "/home/PyCap/redcap/methods/base.py", line 145, in _read_csv
    dataframe = read_csv(buf, **df_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py", line 488, in _read
    return parser.read(nrows)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py", line 1047, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/c_parser_wrapper.py", line 310, in read
    index, names = self._make_index(data, alldata, names)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/base_parser.py", line 415, in _make_index
    index = self._get_simple_index(alldata, columns)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/base_parser.py", line 447, in _get_simple_index
    i = ix(idx)
  File "/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/base_parser.py", line 442, in ix
    raise ValueError(f"Index {col} invalid")
ValueError: Index record_id invalid

This is not happening for the main instrument, where record_id is explicitly listed in the codebook, however it is happening when I try to export all other linked instruments.

Expected behavior Previously, this export method would not have resulted in an error and the record_id column would have been included in the output for each from.

Desktop (please complete the following information):

Please feel free to reach out if you need any additional info, or help to troubleshoot this issue.

Additional Note: This issue is currently preventing many of the RedCap processes that I manage for CHOP Research from running

camachop-dbhi commented 2 years ago

Hey @pwildenhain 😄/ just wanted to tag you on this issue (forgot to do this in the above description).

Hope this is something you can help out with 🙏 thanks!!

pwildenhain commented 2 years ago

Hey hey 👋

I'm on paternity leave until mid January so my debugging help is limited right now 😬

Two ideas:

  1. Trying using the development version on GitHub and see if it succeeds or you get a different error message. I'm in the process of releasing version 2.0 so lots of changes have been made recently

  2. Try modifying the df_kwargs parameter in Project.export_records(). Something like df_kwargs={'index_col'=None}. See the documentation for that parameter here: http://redcap-tools.github.io/PyCap/api_reference/project/ under the export_records() method. At best I think this will help the operation succeed but I don't think it will solve the actual problem of record_id not being included for the other instruments. That almost sound like an API error 😱

If neither of the above work, I would be curious to see how it all comes through just using straight requests.post() to ensure record_id is actually there an PyCap it just missing it somehow

camachop-dbhi commented 2 years ago

Hey @pwildenhain 😄/

Okay thanks for the tips, I'll give both of those a try while you're out on leave and let you know how it goes (and which approach ends up working for me).

Thanks for the quick response even while you're out!!

camachop-dbhi commented 2 years ago

Hey @pwildenhain ,

Just had a chance to do some digging into this and wanted to follow up.

Findings I was able to test out what happened when using the base redcap API itself to export the forms I was having issues with (using syntax outlined here), and it seems that the record_id was also missing from that export. This seems to confirm your submissions that this error was being caused by an API issue, rather than an issue with this package itself.

Not sure what caused this error to crop up now, as I have been running these scripts for a while now without any issues, but I suspect it might be an issue with the version of redcap that we have implemented (as I believe the appearance of these issues line up with a recent version upgrade to our redcap instance).

Resolution Note A workaround that I figured out is able to resolve this issue is to, in the forms argument, including the form that explicitly lists record_id in the codebook for the project along with any other forms I want to pull data for (as shown below).

study_data=redcap_project.export_records(format='df',forms=['demographics','study_data'])

Conclusion All that said, given that this seemed to be a Redcap API issue and that I have found a workaround think its safe for this issue to be closed. But feel free to reach out if you have any follow up questions for me related to any of this!

Thanks again!!

pwildenhain commented 2 years ago

Great synopsis -- last suggestion I have is to report this to the redcap admins and they can let the Vanderbilt team know that this issue popped up. Thanks for the workaround as well -- it'll be really useful to point others to who might be facing similar issues with the new version