leylabmpi / Struo2

Scalable creating/updating of metagenome profiling databases
MIT License
58 stars 8 forks source link

missing species in GTDB 207 gte50comp-lt5cont.nwk tree #49

Open chloelulu opened 2 months ago

chloelulu commented 2 months ago

Hello Developer,

I am currently using the database available at http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release207/kraken2/ for my analysis with Kraken2, and I've found it quite straightforward to use. According to the classification results, my samples show a significant abundance of Acetatifactor sp003612485 and 1XD42-69 sp003612565. Consequently, I plan to use the phylogenetic tree from http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release207/phylogeny/gte50comp-lt5cont.nwk as a reference as well.

However, I noticed that these two species are not present in the tree tips. May I know why they are missing and how I can obtain a complete tree that includes all the species listed in names.dmp?

Here are the commands I used to search for the species:

grep -E 'Acetatifactor sp003612485|1XD42-69 sp003612565' taxonomy/names.dmp
406383408   |   1XD42-69 sp003612565    |       |   scientific name |
611307305   |   Acetatifactor sp003612485   |       |   scientific name |
grep -E 'Acetatifactor sp003612485|1XD42-69 sp003612565' gte50comp-lt5cont.nwk

Your suggestions are much appreciated!

nick-youngblut commented 2 months ago

I believe that Acetatifactor sp003612485|1XD42-69 sp003612565 will have been modified in the newick file, since newick generally does not allow for special characters. You can use https://github.com/tjunier/newick_utils to help extract the names from the tree for better searching via grep.

chloelulu commented 2 months ago

Hi @nick-youngblut , Thanks for the quick response and suggestions. I used newick_utils, some species show up in 1XD42-69 and Acetatifactor. But unluckily still can not find Acetatifactor sp003612485 and 1XD42-69 sp003612565 I am eager to hear your advice!

Please see below code and search results.

nw_labels -I gte50comp-lt5cont.nwk | grep '1XD42-69'
s__1XD42-69_sp011959925
s__1XD42-69_sp910585825
s__1XD42-69_sp910586645
s__1XD42-69_sp910586725
s__1XD42-69_sp910586355
s__1XD42-69_sp910589105
s__1XD42-69_sp009911505
s__1XD42-69_sp910577065
s__1XD42-69_sp910588565
s__1XD42-69_sp014287635
s__1XD42-69_sp017625255
s__1XD42-69_sp017624495
nw_labels -I gte50comp-lt5cont.nwk | grep 'Acetatifactor'            
s__Acetatifactor_sp910577235
s__Acetatifactor_sp910584375
s__Acetatifactor_sp011959105
s__Acetatifactor_sp910583845
s__Acetatifactor_sp910585015
s__Acetatifactor_sp017467845
s__Acetatifactor_sp017461775
s__Acetatifactor_sp017522685
s__Acetatifactor_stercoripullorum
s__Acetatifactor_sp017480445
s__Acetatifactor_sp902796105
s__Acetatifactor_sp002368865
s__Acetatifactor_sp017476665
s__Acetatifactor_sp017478245
s__Acetatifactor_sp017621075
s__Acetatifactor_sp900760705
s__Acetatifactor_sp017559435
s__Acetatifactor_sp017465845
s__Acetatifactor_sp017620975
s__Acetatifactor_sp009177215
s__Acetatifactor_sp900066565
s__Acetatifactor_sp900772845
s__Acetatifactor_sp900771995
s__Acetatifactor_sp900766575
s__Acetatifactor_sp002431915
s__Acetatifactor_sp003447295
s__Acetatifactor_intestinalis
s__Acetatifactor_sp015056915
s__Acetatifactor_sp017624835
s__Acetatifactor_sp015057005
s__Acetatifactor_sp017513625
s__Acetatifactor_sp017473665
s__Acetatifactor_sp910578215
s__Acetatifactor_sp910589655
s__Acetatifactor_sp910577665
s__Acetatifactor_sp910586215
s__Acetatifactor_sp016293615
s__Acetatifactor_sp018385425
s__Acetatifactor_sp910577035
s__Acetatifactor_sp910578815
s__Acetatifactor_sp910586485
s__Acetatifactor_sp016303085
s__Acetatifactor_sp910586835
s__Acetatifactor_sp900554205
s__Acetatifactor_sp900755865
s__Acetatifactor_sp904501885
s__Acetatifactor_sp017397645
s__Acetatifactor_sp910579755
s__Acetatifactor_sp910587755
s__Acetatifactor_muris
s__Acetatifactor_sp910588225
s__Acetatifactor_sp910584865
s__Acetatifactor_sp910580225
s__Acetatifactor_sp910578995
s__Acetatifactor_sp910587555
s__Acetatifactor_sp002490995
s__Acetatifactor_sp910585805
s__Acetatifactor_sp910586775
s__Acetatifactor_sp910586515
s__Acetatifactor_sp910577185
s__Acetatifactor_sp910585665
s__Acetatifactor_sp910576125
s__Acetatifactor_sp910584235
s__Acetatifactor_sp910585615
s__Acetatifactor_sp910585425
s__Acetatifactor_sp910589035
s__Acetatifactor_sp910578185
s__Acetatifactor_sp910584435
s__Acetatifactor_sp910586435
s__Acetatifactor_sp002314715
s__Acetatifactor_sp902766425
s__Acetatifactor_sp900320485
s__Acetatifactor_sp016290775
s__Acetatifactor_sp017527205
s__Acetatifactor_sp015056795
s__Acetatifactor_sp017552985
s__Acetatifactor_sp017400965