nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
150 stars 8 forks source link

No methylation information was calculated for some reads #299

Open Wshengquan opened 2 weeks ago

Wshengquan commented 2 weeks ago

Hi,I used this command to extract methylation information: modkit pileup ${mod_bam} ${bedmethyl_cpg} --cpg --ref ${ref} --no-filtering However, I found that the CpG methylation information in the result file did not appear in the form of positive and negative chain pairs. I then went back to the bam file to see if I didn't have enough data and didn't have enough reads to map to this location. Take this location for example: 5442cfec84055d16a98ee6c5931f228 The resulting file revealed only one reads with methylation information and it was on the positive chain. But when I looked at the bam file, there were 26 reads in this location with both positive and negative chains, and each one carried labels with methylation information: MM and ML. fe100edd2a538a6afada0b2e4a6fe45 Why did this happen Thanks a lot for your help

ArtRand commented 1 week ago

Hello @Wshengquan,

Do you mean that you never see the (-)-strand C calls or just not always? I usually see a pattern like this:

chr20   66405   66406   h       9       +       66405   66406   255,0,0 9       0.00    0       9       0       0       0       0       6
chr20   66405   66406   m       9       +       66405   66406   255,0,0 9       0.00    0       9       0       0       0       0       6
chr20   66406   66407   h       7       -       66406   66407   255,0,0 7       0.00    0       7       0       4       0       4       1
chr20   66406   66407   m       7       -       66406   66407   255,0,0 7       0.00    0       7       0       4       0       4       1
chr20   66594   66595   h       8       +       66594   66595   255,0,0 8       0.00    0       8       0       0       0       0       9
chr20   66594   66595   m       8       +       66594   66595   255,0,0 8       0.00    0       8       0       0       0       0       9
chr20   66595   66596   h       7       -       66595   66596   255,0,0 7       0.00    0       5       2       0       0       9       0
chr20   66595   66596   m       7       -       66595   66596   255,0,0 7       28.57   2       5       0       0       0       9       0
chr20   66830   66831   h       7       +       66830   66831   255,0,0 7       0.00    0       3       4       0       0       0       7
chr20   66830   66831   m       7       +       66830   66831   255,0,0 7       57.14   4       3       0       0       0       0       7
chr20   66831   66832   h       8       -       66831   66832   255,0,0 8       0.00    0       2       6       0       0       3       1
chr20   66831   66832   m       8       -       66831   66832   255,0,0 8       75.00   6       2       0       0       0       3       1
chr20   67134   67135   h       7       +       67134   67135   255,0,0 7       0.00    0       2       5       0       0       0       5
chr20   67134   67135   m       7       +       67134   67135   255,0,0 7       71.43   5       2       0       0       0       0       5
chr20   67135   67136   h       5       -       67135   67136   255,0,0 5       20.00   1       2       2       2       0       3       1
chr20   67135   67136   m       5       -       67135   67136   255,0,0 5       40.00   2       2       1       2       0       3       1
chr20   67856   67857   h       6       +       67856   67857   255,0,0 6       0.00    0       1       5       0       0       0       1
chr20   67856   67857   m       6       +       67856   67857   255,0,0 6       83.33   5       1       0       0       0       0       1
chr20   67857   67858   h       7       -       67857   67858   255,0,0 7       28.57   2       0       5       0       0       1       0
chr20   67857   67858   m       7       -       67857   67858   255,0,0 7       71.43   5       0       2       0       0       1       0
chr20   67858   67859   h       6       +       67858   67859   255,0,0 6       0.00    0       1       5       0       0       0       1
chr20   67858   67859   m       6       +       67858   67859   255,0,0 6       83.33   5       1       0       0       0       0       1
chr20   67859   67860   h       6       -       67859   67860   255,0,0 6       0.00    0       1       5       1       0       0       1
chr20   67859   67860   m       6       -       67859   67860   255,0,0 6       83.33   5       1       0       1       0       0       1

with a command like yours. To debug that particular position, could you produce a plot from IGV (or a similar browser) at that position with the input modBAM and modification coloring turned on? You may also run modkit extract calls ${bam} --include-bed ${bed} where the bed file has contents like this

1 15189 15190 . 0 -

which will extract only the reads aligning to the negative strand at that position. I bet the genome browser shot will help a lot.

Wshengquan commented 1 week ago

I mean for cpg methylation sites, there are always pairs of positive and negative chains. For example, in your results, here is a pair of cpg methylation sites: chr20 66405 66406 h 9 + 66405 66406 255,0,0 9 0.00 0 9 0 0 0 0 6 chr20 66405 66406 m 9 + 66405 66406 255,0,0 9 0.00 0 9 0 0 0 0 6 chr20 66406 66407 h 7 - 66406 66407 255,0,0 7 0.00 0 7 0 4 0 4 1 chr20 66406 66407 m 7 - 66406 66407 255,0,0 7 0.00 0 7 0 4 0 4 1 But in my results, sometimes there are cpg sites with only positive or only negative chains,, such as the sites in my diagram: image I thought that due to my insufficient data, there were no redundant reads on the negative chain, but from my bam file, there were reads on the negative chain at this site, but modkit does not seem to recognize these reads. I also checked to see if any of these reads were labeled MM and found that they were. image So my question is why does modkit not recognize reads with methylation labels

ArtRand commented 6 days ago

Hello @Wshengquan,

It is possible that you have reads covering the position on the negative strand but for one reason or another they don't have modification calls at that position. My suggestion is to look at one of these positions where you don't see a pair of (+)-strand and (-)-strand modification calls on IGV (or modkit extract full). Sometimes you'll have a C>D mismatch preventing the modification calling model to make a prediction.