Closed colindaven closed 11 months ago
I've tried doing this and there's an issue with the code and incompatibility with the way CIGAR strings are encoded for nanopore and pacbio bam data as compared to Bismark processed whole genome bisulfite sequencing bam data.
From what I've looked at it seems long read data uses the bam tags MM and ML. While bisulfite sequencing and bismark tool encode methylation in bam tags XM.
So some changes in the code seem to need to be made to allow compatibility with MM and ML CIGAR strings instead of only XM CIGAR strings in the bam files. This doesn't seem like a hard thing to do, just someone needs to do it. Please let me know when this is done as I'd like to use it too!
From bismark: https://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide.pdf (12) NM-tag (edit distance to the reference) (13) MD-tag (base-by-base mismatches to the reference) (14) XM-tag (methylation call string) (15) XR-tag (read conversion state for the alignment) (16) XG-tag (genome conversion state for the alignment)
https://medium.com/@shlokanegi30/mm-and-ml-tags-in-mod-basecalled-bams-using-remora-9fc68a3cb72
MM and ML tags are specific to BAM files, generated by any DNA mod-basecalling algorithm (Eg- Guppy, Megalodon, Bonito, etc.) using nanopore sequencing long-reads. This blog specifically focuses on understanding MM and ML tags in BAM files generated by Remora for detecting both 5mC and 5hmC mods (DUAL mode)
The SAM tags encoding 5mC positions and scores (MM, ML) are added to all HiFi reads.
wgbstools now supports nanopore files
Hi,
looks interesting. Is there any reason it could not handle nanopore data, eg output from modbam2bed ? https://github.com/epi2me-labs/modbam2bed
We typically convert the output into bedg / bigwig for visualization, so I guess this would be compatible?
Thanks.