Open timpol opened 1 year ago
Thanks for reporting the issue.
Would it be worth putting a warning about this in section 1.2 of the documentation just in case?
Would you mind sharing a CSV that fails? The best option may be to correct this behavior.
I should have mentioned that it depends on the ordering of the columns as to whether or not it fails...
If you take the rawdata.csv
example file and re-order the columns so that any column other than UID
or Session
is first, then save as a .csv in UTF-8 encoding from the spreadsheet program, it fails with a KeyError: 'Sample'
on line for s in sorted({r['Sample'] for r in self})
when self.refresh_samples()
is called (or another such key error).
I think this is because the spreadsheet program puts a UTF-8 byte order mark (BOM) at the start of the file, so D47crunch can't find the column headings it expects to see.
In that case, the best case scenario is that another field gets a spurious BOM in its name, which is likely to cause cryptic problems down the line. If you send me an example file I can try to debug.
The longer-term solution, of course, it to use a more robust csv parsing implementation. There are probably many out there.
On the other hand, you could argue that producing clean csv files is Excel's job, not D47crunch's. Implementing corrective measures in every library likely to encounter this format is expensive. Converting “quasi-CSV” to ”bare-CSV” between Excel and these libraries would probably be less expensive. YMMV.
I agree 100% that it is an Excel problem not a D47crunch problem... I just thought it might be discouraging for a relatively new python user trying to get their D47crunch / D47calib script working.
A simple error msg informing the user that they need to clean up their filthy csv file could do the job?
See file attached: rawdata_dirty.csv
Just a suggestion regarding the documentation...
It seems that
D47data.read()
will fail if the input csv file is in UTF-8 encoding. This is not a problem with D47crunch, but may trip up some new users if they were to produce their csv file using the default csv format of certain spreadsheet programs.Would it be worth putting a warning about this in section 1.2 of the documentation just in case?