nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
187 stars 117 forks source link

bin/add_sh_to_taxonomy.py needs updating to match UNITE db files #543

Closed jtangrot closed 1 year ago

jtangrot commented 1 year ago

Description of the bug

When checking out sbdi-export after #541 I noticed that taxonomies are off by one in e.g. dada2/ASV_tax.tsv for databases GTDB and both UNITE databases, not sure about silva. E.g. columns "Kingdom" and "Phylum" can both have the content "Bacteria", while "Class" contains "Nitrospirota". Also, bin/add_sh_to_taxonomy.py probably needs updating as it adds domain level, which it probably should not do any more. The sbdi-export part seems to work as intended.

Command used and terminal output

No response

Relevant files

No response

System information

No response

d4straub commented 1 year ago

I attempted to reproduce the issue but failed. Here is what I have done:

nextflow pull nf-core/ampliseq
nextflow pull nf-core/ampliseq -r dev
nextflow run nf-core/ampliseq -r 2.4.1 -profile test,singularity --skip_qiime --dada_ref_taxonomy "gtdb" --outdir results_2-4-1_test_gtdb
nextflow run nf-core/ampliseq -r dev -profile test,singularity --skip_qiime --dada_ref_taxonomy "gtdb" --outdir results_2-5-0dev_test_gtdb -resume

(--dada_ref_taxonomy "gtdb" shouldnt be necessary but the plan was to test more, but havent done that atm) I cannot see any confusion with the files results_2-4-1_test_gtdb/dada2/ASV_tax.tsv & results_2-5-0dev_test_gtdb/dada2/ASV_tax.tsv. The output is as expected, the column Domain is missing in results_2-5-0dev_test_gtdb/dada2/ASV_tax.tsv. Could you detail how you encountered differences?

jtangrot commented 1 year ago

I did basically the same as you: nextflow pull nf-core/ampliseq -r dev nextflow run nf-core/ampliseq -r dev -profile singularity,test -resume --skip_qiime --outdir res_gtdb --dada_ref_taxonomy gtdb ... and ended up with results like: head -n 2 res_gtdb/dada2/ASV_tax_species.tsv: ASV_ID Kingdom Phylum Class Order Family Genus Species Species_exact confidence sequence b39be459cc96689db8cd6b00a0d86e8e Bacteria Bacteria Proteobacteria Gammaproteobacteria Burkholderiales Gallionellaceae 0.87 TACGTAGG...

jtangrot commented 1 year ago

However, when trying the same today it seems to work as intended?? Maybe I had some old results/files still around...?? Let me try a bit more...

jtangrot commented 1 year ago

OK, the ASV_tax files are fine. I must unintentionally have reused some old result files when running before... However, bin/add_sh_to_taxonomy.py still needs updating to match UNITE db files. PR soon to come.

jtangrot commented 1 year ago

Fixed in #545