nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
182 stars 115 forks source link

how did the pipeline filter the "filtered-table.qza" #555

Closed bark9299 closed 1 year ago

bark9299 commented 1 year ago

Sorry if this is a silly question but I wanted to know what parameters the pipeline chose to filter the "filtered-table.qza". When I look at the output files on the nexflow ampliseq github page under /qiime2/abundance_tables, you see the filtered-table.qza with a short description of "QIIME2 fragment". When I look at this filtered-table file as a qzv on qiime2 view, the only difference I can see is that the number of features and total frequency decreased from the original "table.qza" file. However, it does not get rid of any samples.

d4straub commented 1 year ago

Hi there, all question welcome!

When I look at the output files on the nexflow ampliseq github page under /qiime2/abundance_tables, you see the filtered-table.qza with a short description of "QIIME2 fragment". When I look at this filtered-table file as a qzv on qiime2 view, the only difference I can see is that the number of features and total frequency decreased from the original "table.qza" file.

I assume you are speaking about the filtered-table.qza file in the AWS results tab on the nf-core website here? Furthermore, I assume the table.qza file is from here? The difference is essentially the filters explained directly below https://nf-co.re/ampliseq/2.5.0/output#qiime2, i.e. --exclude_taxa (default is removing mitochondria & chloroplast), --min_frequency (default is off), and --min_samples (default is off).

However, it does not get rid of any samples.

Correct, the filters are removing ASVs, not samples. The only way to remove samples currently was introduced in version 2.5.0 with https://github.com/nf-core/ampliseq/pull/538, i.e. --diversity_rarefaction_depth and --ancom_sample_min_count.

Is there a specific reason you want to get rid of samples? And based on what specifics (e.g. total sample counts, sample name, ...)? A sample filtering step at other points of the pipeline based e.g. on total sample counts would be relatively easy to add, but I am not sure why it would be needed.

d4straub commented 1 year ago

Close this because it seems clarified and no further response. Please open another issue and/or join the nf-core slack channel #ampliseq.