nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
140 stars 8 forks source link

m6A comparison with Modkit dmr. assertion `left == right` failed? #265

Closed HJosephLee closed 1 month ago

HJosephLee commented 1 month ago

Hi,

I am trying to compare m6A methylation between WT vs KO samples. They are direct RNA sequencing samples. My collaborator used "sup" model in running MinKNOW, and sent me the POD5 files. I aligned POD5 files onto the mouse genome using Dorado, then merged all the output bam files before start.

My running of modkit pileup seems to be quite okay.

chr1  3701043     3701044     a     1     -     3701043     3701044     255,0,0     1     0.00  0     1     0     0     0     0     0 chr1  3701056     3701057     a     1     -     3701056     3701057     255,0,0     1     0.00  0     1     0     0     0     0     0 chr1  4329074     4329075     a     2     -     4329074     4329075     255,0,0     2     0.00  0     2     0     0     1     0     0

After running pileup for both KO and WT, I made a bed file (modkit motif search) to get the DRACH sequence regions. It looks okay, too.

chr1  3050009     3050010     .     .     + chr1  3050024     3050025     .     .     + chr1  3050088     3050089     .     .     -

Then I started having errors with dmr. The first one was about the Tabix index. I BGZipped the bedMethyl files and made indices, which solved the first problem. (This should be added to the vignette by the way!)

The second error message is something that I cannot solve, and is my question about. The command that I used is, modkit dmr pair -t 8 -a wt_modkit_out.bed.gz -b ko_modkit_out.bed.gz --force --base A --header --regions-bed DRACH.bed --ref GRCm39.primary_assembly.genome.fa

The error message that I need your help is,

reading reference FASTA at "GRCm39.primary_assembly.genome.fa" loaded 111951734 regions loading 12 regions at a time [00:00:00] ---------------------------------------- 0/111951734 regions processed 0 regions failed to process thread 'thread 'thread '' panicked at ' panicked at thread '' panicked at src/dmr/bedmethyl.rs' panicked at src/dmr/bedmethyl.rs:src/dmr/bedmethyl.rssrc/dmr/bedmethyl.rs:140::140:140140:5::5: assertion left == right failed left: 0 right: 1 thread '5: 5note: run with RUST_BACKTRACE=1 environment variable to display a backtrace : assertion left == right failed left: 0 right: 1: assertion left == right failed left: 0 right: 1

assertion `left == right` failed left: 0 right: 1' panicked at

src/dmr/bedmethyl.rs:140:5: assertion left == right failed left: 0 right: 1 thread '' panicked at src/dmr/bedmethyl.rs:140:5: assertion left == right failed left: 0 right: 1 thread 'thread '' panicked at ' panicked at src/dmr/bedmethyl.rs:140:5: assertion left == right failed left: 0 right: 1 src/dmr/bedmethyl.rs:140:5: assertion left == right failed left: 0 right: 1 thread 'thread 'thread '' panicked at thread 'src/dmr/bedmethyl.rs' panicked at ' panicked at :src/dmr/bedmethyl.rs' panicked at src/dmr/bedmethyl.rs140:src/dmr/bedmethyl.rs::140:1405:140:: 5:5assertion left == right failed left: 0 right: 1: 5:

assertion left == right failed left: 0 right: 1: assertion left == right failed left: 0 right: 1 assertion left == right failed left: 0 right: 1

Rayon: detected unexpected panic; aborting Aborted (core dumped)

Could you give me any hint on how I can fix this? For your information, I am using Ubuntu 24.04, and used the pre-compiled binary file of Modkit. Thank you.

Joseph

ArtRand commented 1 month ago

Hello @hangnoh,

Sorry about this. Could you try v0.3.3? I have a feeling it might be a bug introduced in 0.4.0.

HJosephLee commented 1 month ago

There's no such an error with v0.3.3!

If the issue is related to the version 0.4.0, in addition to what I have reported, neither motif search nor motif evaluate worked on my hand, so maybe they are also problematic.

Thanks for the help. Joseph

ArtRand commented 1 month ago

@HJosephLee,

I'm going to re-open this issue to track this work. I will have a fix to this regression very soon.

neither motif search nor motif evaluate worked on my hand, so maybe they are also problematic.

Could you elaborate on what isn't working?

HJosephLee commented 1 month ago

Oh, I have deleted the output file, but basically neither of them could capture the DRACH motif for m6A methylation. The output files had a header, but no line for the motif search.

ArtRand commented 1 month ago

@HJosephLee,

If you run:

modkit motif evaluate \
  -i ${bedmethyl} \
  --known-motif DRACH 2 a \
  -o evaluate_table.tsv \
  --log-filepath modkit_evaluate_drach.txt

What do you get in the output table?

Also as an aside on modkit dmr, while you can use a --regions-bed with the DRACH sites, you may consider using single site analysis after preparing the pileup with --motif DRACH 2. This will perform differential methylation detection on each individual A-base in the third position in the DRACH motif. Granted, if you care to find DMRs in these small regions looking at all A-mods - keep with your current approach. I'll make a TODO to improve the performance on this kind of command.

HJosephLee commented 1 month ago

@ArtRand Oh, I have used "motif bed" (= motif-bed in 0.3.3) to get a bed file for the DRACH sites.
modkit motif bed GRCm39.primary_assembly.genome.fa DRACH 2 > DRACH.bed As for what you wanted to check, it runs without the -o, or --out, but with --ref. Well, it runs forever, so I cannot see the stdout yet.

The log info below.

[src/logging.rs::60][2024-09-24 09:09:44][DEBUG] command line: /home/modkit_v0.4.4_u16_x86_64/dist_modkit_v0.4.0_f882a43/modkit motif evaluate -i ./wt_modkit_out.bed.gz --known-motif DRACH 2 a --log-filepath modkit_evaluate_drach.txt --ref Mouse_GRCm39_M33/GRCm39.primary_assembly.genome.fa [src/find_motifs/mod.rs::972][2024-09-24 09:09:44][INFO] loading references from "Mouse_GRCm39_M33/GRCm39.primary_assembly.genome.fa" [src/find_motifs/mod.rs::1003][2024-09-24 09:09:52][INFO] loaded 61 sequence(s) [src/find_motifs/mod.rs::1150][2024-09-24 09:09:52][INFO] using tabix/bgzip reader

And I am not sure I fully understood your comment. Do you mean that if we use motif evaluation, we can determine the differential methylation of DR"A"CH only instead of including all possible "A" positions in D, R, and C??

HJosephLee commented 1 month ago

@ArtRand

Ouch, another bad news. I stopped running what you posted because it took an unexpectedly long time.

Then I tried it again with 0.3.3, which worked fine last time, but didn't work this time (below).

The command I used was,

modkit motif bed GRCm39.primary_assembly.genome.fa DRACH 2 > DRACH.bed  
modkit dmr pair -t 8 -a wt_modkit_out.bed.gz -b ko_modkit_out.bed.gz --force --base A --header --regions-bed DRACH.bed --ref GRCm39.primary_assembly.genome.fa

The above worked fine yesterday. But, today, what I encountered is,

reading reference FASTA at "/media/hangnoh/CBDR/Mouse_GRCm39_M33/GRCm39.primary_assembly.genome.fa" Error! failed to parse supplied regions at "./DRACH.bed" caused by failed to parse stranded (bed4+) line: chr1 3050011 3050012 . . +, Parsing Error: Error { input: ".\t+", code: Float }

Do you have any idea what I am doing wrong? This is pretty embarrassing!

ArtRand commented 1 month ago

Hello @HJosephLee,

We've got two threads going here so let me address each one, one at a time.

tl;dr please try the attached build - these fixes will be released on Friday.

Starting with the modkit dmr pair issue(s). There is a bug causing that assert to fire. Long story short, during the tabix-reading refactor I missed a check - should be fixed now.

One note, if you're principally concerned with DMR at DRACH motifs I would produce a bedmethyl for only these motifs (either by passing a --motif DRACH 2 argument to pileup or with a bedtools intersect), then omit the --regions option to modkit dmr pair. When you don't pass the --regions option, modkit performs "single site analysis" which will find differentially methylated m6As. If you want to group the m6A residues together within a DRACH motif (such as AAACA), then you'll want to keep the command you have. Both should work.

Next the modkit motif evaluate, do you see the progress bar reporting "parsing bedMethyl records", it should be a continually increasing number. Could you try running the command with the uncompressed bedMethyl as input?

modkit motif evaluate \
  -i ./wt_modkit_out.bed \ # <-- note that .gz is removed here
  --known-motif DRACH 2 a \
  --log-filepath modkit_evaluate_drach.txt \
  --ref Mouse_GRCm39_M33/GRCm39.primary_assembly.genome.fa

The tabix reader is unfortunately quite a bit slower than the normal reader. There is a way to speed this up, but I don't have it in this build. Aside, the tabix-reader is really meant to enable using the --contig argument to subset the bedMethyl to a specific contig.

The error you're getting is a result of v0.3.3 not allowing . in the score column. This has been fixed in 0.4.0. I'm not sure how this worked before. Perhaps you changed the DRACH motif-bed? Either way, doesn't matter - use the attached build which doesn't have this problem.

Branch is here if you want to build it, the attached pre-compiled build is the same.

modkit_u16_x86_64.tar.gz

HJosephLee commented 1 month ago

@ArtRand OMG, thank you for the detailed response to my many questions/problems. I am tied up until tomorrow, but I will check with them on Friday. It sounds like everything will be fine, but I will keep you posted (especially on motif evaluate).

HJosephLee commented 1 month ago

@ArtRand

Everything runs smoothly! And, modkit motif evaluate works also fine with the command that you posted. Thank you for the help, and let me know if you want me to test something else!

Joseph

ArtRand commented 1 month ago

Hello @HJosephLee,

Thats great. I fixed the tabix/bgzip reader in modkit motif. I've attached a build with the fix. Try it if you like, otherwise it will be out in tomorrow's release. Thanks, modkit_u16_x86_64.tar.gz

ArtRand commented 1 month ago

Hello @HJosephLee these fixes have been released in version v0.4.1 thanks for bringing this bug to my attention.