nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
144 stars 8 forks source link

How does modkit extract handle indels on reads? #249

Open Fu-Yilei opened 2 months ago

Fu-Yilei commented 2 months ago

Hey I recently am running modkit on/around some structural variant calls. I was interested in using modkit extract to extract the methylation status within each reads on and around SVs. However, when I was trying to run modkit extract with given BED file generated from my SV calls, I found the modkit extract seems only output the location within my provided reference genome. My command is: modkit extract -t 20 -q 100000 --reference GCA_000001405.15_GRCh38_no_alt_analysis_set.fa --read-calls-path ./modkit_extract/sv_related_read_calls.csv --cpg --include-bed sv_related.bed --allow-non-primary --log-filepath log.log ont.bam sv_related.csv In here, sv_related.bed is several hundreds - thousands bps around the SV breakpoint. Does removing the --reference option helps rescuing the locations on reads that are not aligned to the reference genome? Have you guys have any practices with profiling the methylation along with SVs?

ArtRand commented 2 months ago

Hello @Fu-Yilei,

I apologize for the delayed response. When you use the --include-bed option modkit extract will only output mapped modified base positions. I believe there is a log line indicating this behavior. If you want to capture unmapped positions you can't use --include-bed, unfortunately. What you could do is grab reads that overlap with samtools first, then pipe it through extract. You may also want to be sure to use the -Y flag when aligning so that supplementary alignments can be used by extract (see the docs). Adding the reference shouldn't make a difference. Unmapped positions will have reference position -1. Leveraging SVs is something I'm interested in making easier. Maybe the modVCF format will help, open to hearing ideas.

Fu-Yilei commented 2 months ago

Thank you Arthur! Sorry for the late reply, I was moving over the weekend.. While I think modVCF is great for reporting methylation signals with the capability of handling haplotype etc., the real-level analysis is always needed at least for SV-related analysis with methylation. Here I am not talking about stuff like mQTL, but more about reporting methylation changes around a SV. In this case, it is inevitable to look into SV-supporting-reads from the SV callers, which makes the modVCF and current bedmethyl file both cannot handle this task. However, I found visualization sometimes can provide a more intuitive angle for checking methylation changes around SVs. I think Marcus talked about modkit localize over the 1000g meeting last week, which I found quite interesting. Have you guys checked that function around a SV breakpoint?

ArtRand commented 2 months ago

Hello @Fu-Yilei,

Using localize is an interesting idea. The prototype that I have uses the bedMethyl and is a pretty simple "bedtools-like" operation of aggregating methylation frequencies around genomic features. Sounds like that wouldn't work for SVs though since what you care about are reads reporting methylation that isn't in the bedMethyl (i.e. reads partially aligned to the SV). What you could do is essentially another "pileup" where you anchor at the SV breakpoint then use the read-coordinates to look aggregate. What do you think? I'm not an SV expert. If I get you a prototype, would you be willing to test it?

Fu-Yilei commented 2 months ago

Sure I'd love to do that! I have several insertion breakpoint to test in a mixed cell line sample.