Closed hkariti closed 10 months ago
Looks good, thank you for the fix! I am curious - where do you encounter such files though?
Heh. It was a regular sequencing output, nothing fancy. It was done by an external company and I think they're using some new machine. Maybe it decided to utilize the full range of QUAL values for this one :)
When running dedup on a pairsum files that includes quotes in the QUAL field, the result would be a corrupt file. The to_csv method would quote the entire column, and would also escape the quote with a second quote. This results in a file that has QUAL and SEQ of different lengths. To fix this, we ask DataFrame.to_csv to never quote the output.