csv import error + workaround

T3chbug commented 1 year ago

Problem

When I tried to import a measurements .csv I got a ParserError: Error tokenizing data. C error: Expected 20 fields in line 3, saw 23 from ~\mambaforge\envs\devbio-napari-env\lib\site-packages\pandas\_libs\parsers.pyx:

Cause

pandas is unable to read .csv files that use semicolon as seperator. When importing the file with pd.read_csv() the exact same error occurs, and will succesfully import when given the additional parametersep = ";" or sep = None (auto detect). Note that semicolon seperated .csv files usually use comma as decimal seperator, which requires decimal="," (no auto detect available), otherwise decimal numbers will be interpreted as strings.

→ So its not a problem of devbio-napari, but pandas and the different .csv and number formats.

Workaround

After replacing the seperators to match pandas default (comma and dot) the import of the csv file into napari worked.

Also see

panda documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
all the countries using decimal comma -> actually more than the number of countries using decimal point (pandas default)

haesleinhuepf commented 1 year ago

Hi @T3chbug ,

can you share an example CSV file where this happened? Also how was it created?

Thanks!

Best, Robert

T3chbug commented 1 year ago

Hi @haesleinhuepf, the file originates from regionprops export to .csv and was then edited in excel (additional column "custom"). The original unchanged csv could be imported, while the excel edited file caused the error.

Excel uses the system standard number format (german in my case), which lead to the semi-colon seperated csv file with decimal comma numbers which pandas cannot handle on default. After changing the system standard or excel setting to comma seperated csv with decimal dot numbers the import into napari worked.

excel edited file causing the import error

haesleinhuepf / devbio-napari