Closed bazyliszek closed 3 years ago
Yes, I've seen this before too. I'm pretty sure that the data is fine and it's an artefact from Qualimap. I have no idea why it happens - any ideas?
Where does the Multqc stores the data from Qualimap? These numbers are normalized in Multiqc so I would like to see how normalization was calculated. Alternatively, the peak is a single pair from PE, (around 130bp) that appears clearly when data are normalized, so it end up in one narrow bin.
The relevant code is here: https://github.com/ewels/MultiQC/blob/39d83c2e5991f8911e7f522987a1c8f5851c9c5b/multiqc/modules/qualimap/QM_BamQC.py#L158-L175
From memory (it was a long time ago), the Qualimap plot also shows the same spike though I think? So I didn't think it was down to the MultiQC parsing..
See a similar repeat of this conversation on the nf-core Slack: https://nfcore.slack.com/archives/CP3RJSMF0/p1620159835062100
Our conclusion this evening is again that it's something weird in the Qualimap plotting, as it doesn't seem to be replicated by other tools. I don't think that it's MultiQC that is introducing the peak - the raw data from Qualimap seems to contain it too (see the Multi BamQC report plot). As such I'm not sure that there is much we can do except for ignoring the bump / pushing this upstream to the Qualimap developers.
Hello again,
Some years later and this issue is still there. I can't replicate the peak with other software (like fastp), so it does look like a bug of the methylseq pipeline. Did you ever report this to the Qualimap developers in the end? I don't see an issue for that in their Github. Are you planning any solutions/updates (like maybe using alternative software to Qualimap)?
We had a similar issue, adding a step of samtools fixmate
to BAM
processing fixed the insert sizes.
We had a similar issue, adding a step of
samtools fixmate
toBAM
processing fixed the insert sizes.
Thank you I can try that!
Distribution of estimated insert sizes of mapped reads looks wrong on multiqc but ok in Qualimap Report: BAM QC (Insert Size Histogram). After normalization by mutiqc (fraction of reads) there is this artificial huge peak in one region.