sameerd / ppmi

Analysis of Parkinson's Progressive Markers Initiative Data
http://www.ppmi-info.org/
13 stars 13 forks source link

Setup issue #5

Open jgockley62 opened 2 years ago

jgockley62 commented 2 years ago

Hi this repo looks really useful! I'm recieving an error in the setup script and wondered if you had any advice as wo what I may be doing wrong!

> python3 scripts/create_ppmi_database.py 
Processing Code_List.csv
Processing Data_Dictionary.csv
Processing Derived_Variable_Definitions_and_Score_Calculations.csv
Processing Page_Descriptions.csv
Processing Skin_Biopsy.csv
....
Processing ambulatory.csv
                                              basename  pat_id  pag_cnt table_name
2    Derived_Variable_Definitions_and_Score_Calcula...   False        0       None
7                            Gait_Data___Arm_swing.csv    True        0       None
10                          IUSM_ASSAY_DEV_CATALOG.csv   False        0       None
12                         FOUND_Enrollment_Status.csv    True        0       None
13                            FOUND_RFQ_Dictionary.csv   False        0       None
..                                                 ...     ...      ...        ...
214                                      pulserate.csv   False        0       None
215                                            prv.csv   False        0       None
216                                        onwrist.csv   False        0       None
217                                     inbedtimes.csv   False        0       None
218                                     ambulatory.csv   False        0       None

[152 rows x 4 columns]
Maximum Table name is larger than 8
Traceback (most recent call last):
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: True

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "scripts/create_ppmi_database.py", line 158, in <module>
    print (pd_results[tmax > 8])
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: True
sameerd commented 2 years ago

It looks like one of the table names is resolving to something larger than 8 characters. If you know which one it is you can add it to this dictionary in the create_ppmi_database.py script.

The ppmi project constantly adds new tables so this problem keeps coming up. If you figure out which table name is causing a problem and then submit a PR, I can add it to the repo. Thanks

sameerd commented 2 years ago

This is a similar problem to #3 You can see how the create_ppmi_database.py script was updated here