Closed jinnjy closed 2 years ago
My suggestion would be post-process the above and fill in the values with the last known
c__Cyanobacteriia;o__Elainellales;f__Elainellaceae;g__;s__
will become
c__Cyanobacteriia;o__Elainellales;f__Elainellaceae;g__Elainellaceae_unknown;s__Elainellaceae_unknown
and for this
c__Cyanobacteriia;o__;f__;g__;s__
it will become
c__Cyanobacteriia;o__Cyanobacteria_unknown;f__Cyanobacteria_unknown;g__Cyanobacteria_unknown;s__Cyanobacteria_unknown
I am preparing the files according to the example input folder:
https://github.com/medema-group/bigslice/blob/master/misc/input_folder_template/taxonomy/dataset_1_taxonomy.tsv
I used GTDB-tk 1.5 as my taxonomy assignment tool, I encountered some cases which GTDB-tk could not assign genus and species. GCA_010156995.1_ASM1015699v1_genomic dBacteria;pCyanobacteria;cCyanobacteriia;o;f;g;s GCA_010672345.1_ASM1067234v1_genomic dBacteria;pCyanobacteria;cCyanobacteriia;oElainellales;fElainellaceae;g;s__ GCA_010672835.1_ASM1067283v1_genomic dBacteria;pCyanobacteria;cCyanobacteriia;o;f;g;s
I wonder whether you have some suggestion for preparing the files for these case, thank you.