shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

How to Extract GenBank 15-taxon format abbreviated lineage #46

Open yxj17173 opened 3 years ago

yxj17173 commented 3 years ago

Prerequisites

Describe your issue

For examples, human complete lineage is "cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Boreoeutheria; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae; Homo; Homo sapiens;". Human abbreviated lineage is "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo; Homo sapiens;", a 15-taxon format. And Xenopus tropicalis abbreviated lineage is "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; Batrachia; Anura; Pipoidea; Pipidae; Xenopodinae; Xenopus; Silurana; Xenopus tropicalis;". Taxonomy database allows us to flag taxa that should (or should not) appear in the abbreviated classification line in the GenBank flatfiles. For convenience reasons, both GenBank/EMBL-Bank/DDBJ and UniProtKB entries store an abbreviated lineage. How to extract GenBank 15-taxon format abbreviated lineage or could you add a reformat of the 15-taxon format. Thanks for the state-of-the-art tools!