Closed tolot27 closed 6 years ago
The default format string could be {a}
for all or {F}
for full lineage.
Hi Mathias, I've considered this before, but I think it's better to leave the reformat
an independent command to keep the commands modular.
For the speed, the slowest part is parsing the names.dmp
and nodes.dmp
files. There should be no big difference for large datasets (taxid list file?).
You are right for the speed. But at least one additional column is added to the output.
Anyway, it's your decision.
Ha ha, you can discard the original lineage column easily using https://github.com/shenwei356/csvtk .
For example, csvtk cut -f -5 data.csv
for removing 5th column, or csvtk cut -t -f -col5 data.tsv
for discarding column col5
.
I know how to use cut
or csvtk
. It is just an additional pipe/process and more stuff to maintain in the scripts.
It would be great if the
lineage
command supports the same format options as thereformat
command does. That would avoid a second pipe and save processing time, especially for large datasets.