nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
178 stars 112 forks source link

Error executing process > 'RelativeAbundanceReducedTaxa (1)' #164

Closed nbargues closed 3 years ago

nbargues commented 3 years ago

Hi,

I have an error during : Error executing process > 'RelativeAbundanceReducedTaxa (1)'

Here is the post I send to the qiime forum :

https://forum.qiime2.org/t/error-during-relative-abundance-step/16774

Thanks

d4straub commented 3 years ago

Hi there, QIIME2 forums are not the right place to discuss this, this is a specific issue with your data and ampliseq. To make it short: you seem to have no taxonomic classifications that go below level 2 (kingdom, I believe). Therefore, the ASV table cannot be used for taxonomic level 3 or above. This is not a problem with the pipeline but with your data, it seems. To make the pipeline finish gracefully, please add to your command --skip_ancom --skip_abundance_tables -resume. This should skip all steps where taxa are collapsed. Next, please make sure that everything is alright with taxonomic classification, it looks odd that your classification is so shallow.

nbargues commented 3 years ago

I know that is probably a classifier problem but I need the abundance tables, it's the reason why I'm using ampliseq. Do you notice something wrong with my command for creating my classifier ?

d4straub commented 3 years ago

No, these classifier commands look fine to me.

But your amplified region is quite long, more than 500bp. This might be tricky with MiSeq data. Do you have a sufficient number/fraction of reads passing DADA2? Check this in results/abundance_table/unfiltered/dada_stats.tsv: The numbers in the last column should have 50-90% of the numbers of the second column.

Also please check results/taxonomy/taxonomy.tsv, there you find your taxonomic classifications. Make sure these look fine. I doubt that this is the case.

nbargues commented 3 years ago

I performed a 2 x 300 read length on V1/V3 with 560 insert size and 30 overlap.

Yes you are correct, I have less than 5% in dada_stats.tsv and my taxonomy only have 0_Bacteria ....

Do you have an idea how to change that ? Thanks

d4straub commented 3 years ago

To increase that numbers you can basically use two strategies for DADA2: (1) truncate reads more aggressively, i.e. lower numbers for --trunclenf & --trunclenr. Problem is, you have not a lot of choice here. Still, look at result/demux/index.html and choose the lowest --trunclenf & --trunclenr that lead to 20bp overlap. (2) Allow more reads to pass the DADA2 quality filter (increase -ee). However, this feature isn't exposed in ampliseq. Another choice would be to forfeit DADA2 and use Deblur. This is also not supported by this pipeline. My plans are to expose DADA2's -ee so that there is one more possibility to tackle that problem. But I do not have a timeline here.

nbargues commented 3 years ago

Ok I run with what you recommend in (1). See if that change something. I've plan a new run of sequencing on Monday, but if I have the same result as this run, it seems pointless. What are your thought about what are wrong with my data ( ie quality , length , pcr duplicate ) ? I attach the multiqc report

multiqc_report.zip

d4straub commented 3 years ago

Quality scores seem too low. Test some more truncation values when you see that the choice that you made now improves results. i.e. reduce both by 5 and see if the overlap is still sufficient. If that isnt sufficient, you need to use other methods.

nbargues commented 3 years ago

My read length is 300 bp. After trimming with ampliseq, we see that the mean read length are 280 bp. So I run ampliseq with --trunclenf 150 and --trunclenr 150. With that I still have an error with this step but at the level 7 :

Command error: Plugin error from taxa:

Requested level of 7 is larger than the maximum level available in taxonomy data (6).

But I think that 150 is the limit for read of 280 bp and for having an overlap of 20. Do you think that I should truncate at 100 for example ?

d4straub commented 3 years ago

Indeed, after trimming with ampliseq, the mean read length is reduced by the primer length (which is removed, because non-biological data) and 300bp raw reads should be ~280bp.

Your previous information with (1)

I performed a 2 x 300 read length on V1/V3 with 560 insert size and 30 overlap.

and than this sentence (2)

So I run ampliseq with --trunclenf 150 and --trunclenr 150

do not fit.

If (1) is true, you really have an expected insert size of 560bp (including primer sequences! -> 560-20-20=520 excluding primer sequences), than --trunclenf 150 and --trunclenr 150 will only allow 280bp amplicons to be merged (150+150-20), which is <520bp. Therefore close to no sequences should pass the DADA2 step, because merging fails. The solution here would be to use something like --trunclenf 275 --trunclenr 265 = 540bp (520 amplicon + 20 overlap).

edit: If this is not what you mean, please share (a) the exact and complete command you use to start ampliseq and (b) the file results/abundance_table/unfiltered/dada_stats.tsv

nbargues commented 3 years ago

The new value solved the issue. Thanks !