guma44 / GEOparse

Python library to access Gene Expression Omnibus Database (GEO)
BSD 3-Clause "New" or "Revised" License
137 stars 51 forks source link

Parse data table #34

Closed halioui closed 6 years ago

halioui commented 6 years ago

I got an IndexError for some GSM SOFT txt files. For instance: for GSM32878 (string index out of range): geo = GEOparse.get_GEO('GSM32878'):

Traceback (most recent call last):
  File "indexReportUpdate.py", line 825, in <module>
    createIndices("GSM", outputDoc[2], outputEdg[0], outputDoc[6], outputEdg[2])

 File "indexReportUpdate.py", line 171, in createIndices
    raise e
  File "indexReportUpdate.py", line 163, in createIndices
    geo = GEOparse.get_GEO(filepath=fpath, silent=True)
  File "/home/mimsadm/.local/lib/python3.5/site-packages/GEOparse/GEOparse.py", line 82, in get_GEO
    return parse_GSM(filepath)
  File "/home/mimsadm/.local/lib/python3.5/site-packages/GEOparse/GEOparse.py", line 374, in parse_GSM
    table_data = parse_table_data(soft)
  File "/home/mimsadm/.local/lib/python3.5/site-packages/GEOparse/GEOparse.py", line 329, in parse_table_data
    data = "\n".join([i.rstrip() for i in lines if i[0] not in ("^", "!", "#")])
  File "/home/mimsadm/.local/lib/python3.5/site-packages/GEOparse/GEOparse.py", line 329, in <listcomp>
    data = "\n".join([i.rstrip() for i in lines if i[0] not in ("^", "!", "#")])
IndexError: string index out of range
guma44 commented 6 years ago

Hi, Thanks for the report. It seems that there are some empty lines int he file. I will take a look at it.