shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
366 stars 29 forks source link

Export format problem with taxonkit and csvtk #100

Open AlenaYoung opened 1 month ago

AlenaYoung commented 1 month ago

Hi,

I hope to gain taxonomy info while running taxonkit and csvtk using-t, which is helpful for me to import the result into R. But R and excel seems to have trouble importing the result. Some lines such as (et al.2015) and Sedi can't be import effectively. 得到的R导入结果要么是全部集中在一行(read.csv),要么是得到超过结果的行数和列数(read.table)

My script is as shown below: taxonkit lineage taxid.txt -j 120 | taxonkit reformat -r NA -R 0 -j 120 | csvtk -H -t cut -f 1,3 | csvtk -H -t sep -f 2 -s ';' -R | csvtk add-header -t -n taxid,kingdom,phylum,class,order,family,genus,species | csvtk pretty -t -o taxid_out.csv

My R script is as shown below: test2 <- read.table("taxid_out.csv",header = TRUE)

The output file I get is as follows. taxid_out.csv

Any help will be much appreciated. Thank you in advance,

Alena

shenwei356 commented 1 month ago

csvtk pretty is for formatting readable format in terminal, the output is not tab or comma deleted file any more.

$ taxonkit lineage <(echo 9606)  \
    | taxonkit reformat -r NA -R 0  \
    | csvtk -H -t cut -f 1,3 \
    | csvtk -H -t sep -f 2 -s ';' -R \
    | csvtk add-header -t -n taxid,kingdom,phylum,class,order,family,genus,species \
> taxid_out.csv

$ cat taxid_out.csv
taxid   kingdom phylum  class   order   family  genus   species
9606    Eukaryota       Chordata        Mammalia        Primates        Hominidae       Homo    Homo sapiens

$ csvtk pretty -t taxid_out.csv -S grid
+-------+-----------+----------+----------+----------+-----------+-------+--------------+
| taxid | kingdom   | phylum   | class    | order    | family    | genus | species      |
+=======+===========+==========+==========+==========+===========+=======+==============+
| 9606  | Eukaryota | Chordata | Mammalia | Primates | Hominidae | Homo  | Homo sapiens |
+-------+-----------+----------+----------+----------+-----------+-------+--------------+

btw, -j 120 does not help.

  -j, --threads int       number of CPUs. 4 is enough (default 4)