Closed apcamargo closed 2 years ago
The original .xlsx file introduces a newline character and a space by accident in the line 4375:
4375 Tasmania/Sarcophilus_harrisii/2017/frag_3871_SRR8
048111
4376 Tasmania/Sarcophilus_harrisii/2017/frag_4262_SRR8048117
4377 Tasmania/Sarcophilus_harrisii/2017/frag_4482_SRR8048117
Luckily, the CSV parser used by csvtk is tolerant of this with support of CSV values of multiple lines.
So just remove the \n
characters.
csvtk xlsx2csv ictv.xlsx | csvtk replace -F -f "*" -p "\n " -r "" > ictv.csv
PS: I also found other unwanted characters. e.g., M-BM- characters, which were handled in https://github.com/shenwei356/ictv-taxdump#steps
Thanks, @shenwei356!
I'm not entirely sure what is causing this to happen. But here are the steps to reproduce:
Then, in lines 4375 and 4376:
This should be a single line