smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
67 stars 27 forks source link

MAPQ filtering #203

Closed vagan21 closed 1 year ago

vagan21 commented 2 years ago

Hello, Is there a step in abismal where reads that don't meet a certain MAPQ threshold are filtered out? I am wondering if I need to run a samtools view -q 30 (for example, I've used 30 with RNA_seq) on my sam files before proceeding to do samtools sort, and then format_reads. Hope that makes sense. Verda

guilhermesena1 commented 2 years ago

Hello,

Abismal doesn't report a MAPQ value for reads, so no filtering is necessary. The definitionof MAPQ is somewhat arbitrary between mappers, so 30 in one mapper doesn't necessarily mean the same as 30 in another.

For abismal, reads that map uniquely are already, by definition, more likely to map to its reported location than anywhere else in the genome. Ambiguous reads that map to two different locations in the genome with equal alignment score are discarded by default (unless you set the -a option).

We generally recommend using the pipeline steps and not worrying about quality-based filtration of mapped reads. The important tools to "clean" the output are already part of the pipeline (e.g. duplicate-remover).