maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
378 stars 85 forks source link

how does snakePipes find and reports contaminations? #917

Closed sunta3iouxos closed 1 year ago

sunta3iouxos commented 1 year ago

Hi all, To find contaminations I am using the sortMeRNA tool. Is there something like that implemented in snakePipes? Is this done by the -RNAseq pipeline using the --dnacontam string? What kind of output it produces?

Thank you in advance Theodoros

P.S. is it possible to add tags in the new issue creation form? like bugs, questions, suggestions? Or you are assigning this kind of tags?

NixBio commented 1 year ago

Thank you for your message. I am out of office on August,14 2023. I will answer your email once I am back in my office.

In urgent cases, please, contact genomics-core(at)rcii.de

Kind Regards, Nicholas Strieder

-- Dr. rer. nat. Nicholas Strieder ~~

Leibniz-Institut für Immuntherapie - LIT NGS Core - Bininformatics Universitätsklinikum Regensburg Franz-Josef-Strauß-Allee 11 93053 Regensburg Germany

Phone: ++49 (0)941 944 18188 E-mail: @.***

sunta3iouxos @.***> 14.8.23 12:18 >>>

Hi all, To find contaminations I am using the sortMeRNA tool. Is there something like that implemented in snakePipes? Is this done by the -RNAseq pipeline using the --dnacontam string? What kind of output it produces?

Thank you in advance Theodoros

P.S. is it possible to add tags in the new issue creation form? like bugs, questions, suggestions? Or you are assigning this kind of tags?

-- Reply to this email directly or view it on GitHub: https://github.com/maxplanck-ie/snakepipes/issues/917 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

adRn-s commented 1 year ago

Usually, we monitor contamination using Kraken. But that's not done on snakePipes, which is mostly focused on secondary analyzes. The contamination, for us, is mostly something that goes with MultiQC/ primary analyzes.

katsikora commented 1 year ago

Hi Sunta3iouxos,

snakePipes doesn't currently have any implementation of removing reads from fastq files based on some filter.

as to the --dnaContam flag in snakePipes mRNA seq:

dnaContam: Enable this to test for possible DNA contamination in your mRNA-seq samples. DNA contamination is quantified as the fraction of reads falling into intronic and intergenic regions, compared to those falling into exons. Enabling this option would produce a directory called GenomicContamination with .tsv files containing this information.

This is doing some flavours of counting with featureCounts such that you end up with read fractions falling into exons, introns and intragenic regions. You can obtain similar output with deepTools plotEnrichment.

Hope this helps,

Best,

Katarzyna

sunta3iouxos commented 1 year ago

Thank you for the information. @adRn-s I checked kraken and it appears to be a bit demanding on resources. @katsikora have you ever used any of the available tools? do you have any recommendations? Could you maybe in the future have a module we could use to assess contamination from fastq files?

adRn-s commented 1 year ago

Out of curiosity, are you working with mouse RNAseq?

If you wanted to check for prokaryotic ribosomal RNA you may target the conserved regions on 16S gene with blast for example.

katsikora commented 1 year ago

For assessing contamination of off-target species, we used to use fastq screen before moving to kraken. Both tools produce contamination reports, they don't remove anything from your original fastq files. We could consider adding a module to the preprocessing workflow. In any case, it would require building either a ton of indexes for all the organisms you want to test against in case of fastQ screen, or a contaminome database for kraken.

In theory, we could add this sortMeRNA tool, as it's available on conda, but I'm not sure the other snakePipes workflows would be well-suited for metagenomics data. Is this what you are working with?

Hope this helps,

Best,

Katarzyna

katsikora commented 1 year ago

About tagging the issues with labels, can you access the controls? I would image this would be restricted to organization members and repo owners.