statisticalbiotechnology / triqler

The triqler (TRansparent Identification-Quantification-linked Error Rates)'s source and example code
Apache License 2.0
19 stars 9 forks source link

Windows creates empty lines with csv reader/writer #9

Closed MatthewThe closed 5 years ago

MatthewThe commented 5 years ago

All of the parsing within triqler is done with the csv module. On Windows there are some caveats as to how this can be used and in the current implementation it ends up inserting blank lines after every row that is written. This can for example result in the error:

File "C:\Users\Matt\Anaconda3\envs\triqler\lib\site-packages\triqler\parsers.py", line 111, in parseTriqlerInputFile
    intensity = float(row[intensityCol])
IndexError: list index out of range

It also affects the prepare_pin.py script from the Quandenser repository. Here the extra blank lines cause each feature to be considered its own feature group, which then ends up being discarded because there are too many missing values. Eventually, this leads to the following error:

File "C:\Users\Matt\Anaconda3\envs\triqler\lib\site-packages\triqler\qvality.py", line 38, in getQvaluesFromScores
    medians, negatives, sizes = binData2(allScores, decoyScores)
ValueError: not enough values to unpack (expected 3, got 0)

This can apparently be fixed by using the newline argument when opening files, but unfortunately this is Python3 specific, which would cause the Python2 package to error out. Another solution is to use the io.open function (which exists both in Python2.6+ and Python3) to replace the normal open function, but apparently this can drastically slow down reading and writing.