shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

About the special taxonID search #48

Closed hansir8 closed 3 years ago

hansir8 commented 3 years ago

Dear WeiShen, I meet a special condition for using the taxonkit, and I did not solve this problem. First, I had a list of NCBI taxonomy ID, which include a large of taxonomy iD involving into several species. Second, I need to extract/filter a special taxonomy ID (e.g., 10239, viruses) and its all child lineages from the list of large of taxonomy ID, followed by printing the taxonomy name (not ID). I have read the tutorial you provided, which are significantly integrated and abundant. However, I did not touch the progress or command adapted to my demand. So, I am curious if the taxonkit could provide the funtion I needed? Thanks your beautiful work!

shenwei356 commented 3 years ago

Not sure what OS you are using, the commands below can work on Windows too, after installing csvtk.

If I understand correctly, you need

  1. Suppose the taxids.txt contains taxids you have:

    csvtk cut -f 1 taxids.txt
    9606
    9913
    376619
    349741
    239935
    314101
    11932
    1327037
    123124124
    3
    92489
    1458427
  2. List all taxids below 10239

    taxonkit list --ids 10239 --indent "" -o viruses.txt
  3. Filtering

    csvtk grep -Ht -P viruses.txt taxids.txt -o taxids.filter.txt
  4. Printing name

    taxonkit lineage -n -L taxids.filter.txt -o result.txt
    
    csvtk pretty -H result.txt
    11932   Mouse Intracisternal A-particle
    1327037 Croceibacter phage P2559Y
  5. Only names if you like

     taxonkit lineage -n -L taxids.filter.txt | csvtk cut -Ht -f 2 -o result.txt
    
     csvtk pretty -H result.txt
     Mouse Intracisternal A-particle
     Croceibacter phage P2559Y
hansir8 commented 3 years ago

Thank your reply! I have solved this problem from your suggestion. Thanks again!