igvteam / igv

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations
https://igv.org
MIT License
644 stars 387 forks source link

Visualizing 5mC/5hmC at only CG or CH sequence. #1479

Open miwi610 opened 9 months ago

miwi610 commented 9 months ago

Dorado basecaller for ONT can call 5mC/5hmC not only at CpG, but also at CH sequence, now. And many researchers, I think, like to visualize the modification at CG and CH separately, as IGV have those options at bisulfide mode. If there are not so yet, please implement to visualize 5mC/5hmC modification of ONT bam at CG and CH sequence, separately, similar to bisulfite mode. I checked the newest 2.17.1 and could not find such option so far. I am not sure if there are needs for 6mA modificaiton at only AT and A[C|G|A], too.

jrobinso commented 9 months ago

Basemod visulaization does not depend on context (CG or CH). I'm not really sure what you are asking for, IGV is going to display base mods chosen wherever they appear.

miwi610 commented 9 months ago

There are options in current IGV at "Color alignments by" -> "bisulfite mode" -> "CG" for whole-genome bisulfite sequencing data. I was asking if it is possible to do so for ONT data. Visualization of 5mC for CG sequence only, etc. Is it possible?

jrobinso commented 9 months ago

Yes, if base modifications are recorded in the BAM file (MM / ML tags). That has been available for over a year. See the screenshot below, and look at the user guide, perhaps I'm not understanding what you are asking for.

Screen Shot 2024-01-29 at 10 36 37 PM
jrobinso commented 9 months ago

https://igv.org/doc/desktop/#UserGuide/tracks/alignments/base_modifications/

miwi610 commented 9 months ago

Sorry for my language problem. Previous ONT data only contained 5mC/5hmC modification at CG sequences ("5mCG_5hmCG" option in dorado/megalodon), which was nicely visualized in IGV as shown in your screenshot and the UserGuide. New dorado output has an option, "5mC_5hmC", to basecall 5mC/5hmC not only CG, but also CH, meaning all C and G (complement of C) have values and visualized in IGV. In most of mammalian cells, most of CpG are methylated and CpH are unmetlylated, therefore in IGV, most of C and G (complement of C) are visualized as unmethylated and small number of C are methylated (CpG)(screenshot). For instance, the IGV view shown in zoomed non-CGI region show methylated at CG and unmethylated at other Cs. If I zoom out, the region looks not much methylated, although all CpG in this region are fully methylated. To reproduce what I used to see, visualizing only CG sequence, I can re-basecall with "5mCG_5hmCG" option for all my data. But if IGV can extract and visualize only CG modification, it is very useful and computer-resource frienly. I hope you understand what I mean... スクリーンショット 2024-01-30 16 51 24 スクリーンショット 2024-01-30 16 52 33

jrobinso commented 9 months ago

OK, I understand I think. Your language is fine, I think the title of the ticket summarizes it well.

To generalize this, If I understand correctly, you are requesting that base modifications be shown only at specific contexts. By that I mean only at genomic locations that contain a specific sequence. I can see how that might be useful but it would be complex to implement, and I cannot see this being prioritized anytime soon.

miwi610 commented 9 months ago

Yes! That is exactly what I meant. I can imagine this is complexed step, especially for ONT bam data. At least it is troubling me to decode MM and ML values with reference sequence and long CIGAR string, and find out values for what I need to know. Thanks a lot for considering to include in the future version of IGV.

jrobinso commented 9 months ago

Let's leave it open for now, I'm just trying to be realistic about the effort and time. @marcus1487 @ctsa do you have any thoughts on this?

jrobinso commented 9 months ago

@miwi610 Have you tried the single color option base modification (all)? If you are just looking for methylated bases this might work better than the 2 color option for this situation.

miwi610 commented 9 months ago

Yes. I noticed I was viewing with IGV 2.16.2 with "Color alignment by -> base modification (5mC)" in the screenshot above. I made 4 screenshots, attached. For all of them, upper panel is viewing bam from sup,5mC_5hmC,6mA option in dorado 0.5.2, lower panel is that from sup,5mCG_5hmCG option (MM and ML at only CG target) of same ONT pod5 data. What I am asking is to visualize data in upper panel looks similar to lower panel. Blue/red colored pictures are shown in IGV 2.17.1with Color alignment by -> base modification (5mC). Cyan colored pictures are shown with Color alignment by -> base modification (5hmC). 1st and 3rd picture with region of interest marks at CGI near the gene promoter and non-CGI are further zoomed at ROI, 2nd and 4th pictures. I hope you understand what I mean... Because the modification levels (both 5mC/5hmC) at CH are low, zoomed view seems to look similar to what it is suppose to look (low 5mC at CGI, high at non-CGI, higher 5hmC at CGI, lower at non-CGI). Therefore it is not-so-bad level in this scale, I just need to separately mark where the GC sequences are. But if I zoom out to see wider area, the difference is masked and all area looks highly modified. スクリーンショット 2024-02-01 14 33 56 スクリーンショット 2024-02-01 14 32 49 スクリーンショット 2024-02-01 14 35 53 スクリーンショット 2024-02-01 14 32 05 When I was preparing these screenshots, I found two points.

  1. When zooming out, IGV seems to keep higher value to sum/average local area, which is good for most of case, but may not for DNA methylation (at CG site).
  2. I still need to discriminate 5mC/5hmC level at CG and others and this is true to most of researchers on DNA methylation study. I hope this is useful comment to IGV.