nf-core / methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
https://nf-co.re/methylseq
MIT License
140 stars 145 forks source link

Insert size histogram in multiqc #95

Closed bazyliszek closed 3 years ago

bazyliszek commented 5 years ago

Distribution of estimated insert sizes of mapped reads looks wrong on multiqc but ok in Qualimap Report: BAM QC (Insert Size Histogram). After normalization by mutiqc (fraction of reads) there is this artificial huge peak in one region.

ewels commented 5 years ago

Yes, I've seen this before too. I'm pretty sure that the data is fine and it's an artefact from Qualimap. I have no idea why it happens - any ideas?

bazyliszek commented 5 years ago

Where does the Multqc stores the data from Qualimap? These numbers are normalized in Multiqc so I would like to see how normalization was calculated. Alternatively, the peak is a single pair from PE, (around 130bp) that appears clearly when data are normalized, so it end up in one narrow bin.

ewels commented 5 years ago

The relevant code is here: https://github.com/ewels/MultiQC/blob/39d83c2e5991f8911e7f522987a1c8f5851c9c5b/multiqc/modules/qualimap/QM_BamQC.py#L158-L175

From memory (it was a long time ago), the Qualimap plot also shows the same spike though I think? So I didn't think it was down to the MultiQC parsing..

ewels commented 3 years ago

See a similar repeat of this conversation on the nf-core Slack: https://nfcore.slack.com/archives/CP3RJSMF0/p1620159835062100

Our conclusion this evening is again that it's something weird in the Qualimap plotting, as it doesn't seem to be replicated by other tools. I don't think that it's MultiQC that is introducing the peak - the raw data from Qualimap seems to contain it too (see the Multi BamQC report plot). As such I'm not sure that there is much we can do except for ignoring the bump / pushing this upstream to the Qualimap developers.

PanosProv commented 4 months ago

Hello again,

Some years later and this issue is still there. I can't replicate the peak with other software (like fastp), so it does look like a bug of the methylseq pipeline. Did you ever report this to the Qualimap developers in the end? I don't see an issue for that in their Github. Are you planning any solutions/updates (like maybe using alternative software to Qualimap)?

bounlu commented 4 months ago

We had a similar issue, adding a step of samtools fixmate to BAM processing fixed the insert sizes.

PanosProv commented 3 months ago

We had a similar issue, adding a step of samtools fixmate to BAM processing fixed the insert sizes.

Thank you I can try that!