Open hariszaf opened 1 year ago
Regarding COI, this is now covered under #56 --> the outputs are already in the required 7-levels.
Regarding 16S, we still wait for the Silva update. However, we have been waiting for a while and are getting a bit fed-up with waiting, hence it would be useful to do this ourselves. For advice on how to do this (and if it is feasible), consult with @hariszaf and @cpavloud
Regarding the ITS gene and the Unite database: one thing you could do is to get the General FASTA release (download) file and from there get the sequences id.
For example:
>Glomeraceae|AM076560|SH146432.05FU|refs|k__Fungi;p__Glomeromycota;c__Glomeromycetes;o__Glomerales;f__Glomeraceae;g__;s__uncultured_Glomus
The AM076560
is the sequence id.
Using that, you can get from the NCBI the organism it comes from https://www.ncbi.nlm.nih.gov/nuccore/AM076560 and therefore, its NCBI taxonomy id.
may be some interplay with https://github.com/hariszaf/pema/issues/29 here
It would be super useful to return the pema main output (otu/asv table) in a 7-level taxonomy format, meaning all taxonomy assignments are as: