nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
188 stars 118 forks source link

Unclassified #775

Closed kesava31 closed 4 weeks ago

kesava31 commented 2 months ago

Hi I am running ampliseq for eukaryotic classification. I can see many unclassified data based on Barrnap.. (Barrnap classified 13 ( 0.42 %) ASVs as most similar to Bacteria, 16 ( 0.52 %) ASVs to Archea, 0 ( 0 %) ASVs to Mitochondria, 156 ( 5.09 %) ASVs to Eukaryotes, and 2878 ( 93.96 %) were below similarity threshold to any kingdom.)

I using PR2 5.0 database with the following run command. Any suggestion to improve the classification.

nextflow run nf-core/ampliseq -r 2.10.0 -profile singularity --max_cpus 16 --input_folder (Project) --FW_primer CGGTAAYTCCAGCTCYV --RV_primer CCGTCAATTHCTTYAART --cutadapt_min_overlap 3 --trunc_qmin 20 --trunclenf 230 --trunclenr 180 --min_len 50 --max_len 550 --vsearch_cluster --vsearch_cluster_id 0.97 --filter_ssu euk --dada_ref_taxonomy pr2=5.0.0 --max_memory 16.GB

d4straub commented 2 months ago

It is slightly worrisome that barrnap cannot classify so many ASVs. And because you are using --filter_ssu euk, all of those are discarded before taxonomic classification with PR2. So I recommend to omit --filter_ssu euk and append -resume and check in the final taxonomic classification whether that might be better aligned to your expectation.

kesava31 commented 2 months ago

Thank you very much. I run the pipeline with by removing --filter_ssu euk. I hope it should work now.

d4straub commented 4 weeks ago

I'll close that now. Feel free to open another issue if needed.