nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
151 stars 8 forks source link

IGV visualization of bedMethyl file #104

Open MaestSi opened 11 months ago

MaestSi commented 11 months ago

Dear all, I would like to know if there is a way to properly visualize methylation information obtained from modkit in IGV genome browser (testing IGV 2.17 snapshot). In particular, I ran Dorado with --modified-bases 5mC_5hmC 6mA parameter, I converted the unmapped bam file to fastq with samtools fastq -T '*’ $UNMAPPED_BAM > $FASTQ and I mapped the reads to the reference with minimap2 retaining all information in the reads' headers (-y parameter). Finally, I ran modkit pileup $BAM modkit.bedMethyl. I have been able to upload to IGV the bam file, and to color bases by modification. However, I would like to make the bedMethyl file more informative in a visual way. For example, I would like to color the track by %Showing methylation, is it possible? Any advice for better visualization would be greatly apprecciated.

mods_detection_example

Thanks, SM

ArtRand commented 11 months ago

Hello @MaestSi,

Sorry for the delay and thanks for the question. Probably the easiest thing is to convert the bedMethyl into a bedGraph or bigWig file. You would have to decide if you want to combine the top and bottom strands into one track or not, as well as whether or not to combine the modification calls. modkit has a --bedgraph option, or you can convert the bedMethyl in bash pretty easily. I'd recommend converting the bedMethyl into a bedGraph. Allowing additional formats is on the roadmap as well, mostly bigWig and bigBed. Coloring based on the %-methylation is an interesting idea, I'll have to play around with IGV to see.

Raya-Faigenbaum-Romm commented 2 months ago

Hi @MaestSi,

I am facing a similar need as you described. I have multiple unmapped bam files for each sample that contain methylation information from nanopore sequencing. I wish to map them to the genome, find the location of methylations in the genome and compare between samples. I tried a similar approach to what you described (from unmapped bam to fastq using samtools, mapped the fastq with minimap2, back to bam from the sam output from minimap2 and then modkit on the new bam) but I get an error from modkit:

Error! zero reads found in bam index.

Could you please share your run lines? Specifically, I have several questions:

  1. When I run samtools fastq -T '*' FILE_NAME.bam > FILE_NAME.fastq I get an error: Parsing extra tags - '*' is not two characters. So I run it with '**'. Does this matter?

  2. It looks like the sam file that was created after the mapping with minimap2 does not contain the methylation information from the original bam file. I used the -y flag as you suggested. In which field do you see the methylation information in your sam file? I run: minimap2 -y FASTA_FILE.fna FILE_NAME.fastq > FILE_NAME.sam

  3. I used this code to switch from sam generated by the minimap2 to bam: samtools view -bS FILE_NAME.sam | samtools sort -o FILE_NAME.bam And got these errors:

    [E::sam_parse1] missing SAM header [W::sam_read1] Parse error at line 1 [main_samview] truncated file.

I would highly appreciate any help.

Thanks a lot, Raya

ArtRand commented 2 months ago

Hello @Raya-Faigenbaum-Romm,

I strongly recommend using dorado aligner (docs) for mapping an BAM with base modification information (modBAM), you can pass the -Y flag if you want to capture non-primary alignments (modkit docs). E.g.

dorado aligner ${ref} ${bam} -t 48 -Y > mapped.bam
Raya-Faigenbaum-Romm commented 2 months ago

Hello @ArtRand,

Thank you very much for the fast response. I had no previous experience with nanopore sequencing and with methylation analysis and I was trying to solve it for several days. You really helped me. dorado aligner worked great!

I hope I can ask two more questions, how do you suggest to summarize the methylation analysis of modkit run on all bam files for one sample. I saw the option of modkit summary that gives information about one bam file. Is there a possibility to summarize methylation results on all bam files of one sample?

How do you suggest to compare between 2 sample to see coordinates and differences in methylations?

Thanks a lot, Raya