nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Can dorado mistakenly identify m6A as a result of other methylation on A? #951

Open Salvobioinfo opened 1 month ago

Salvobioinfo commented 1 month ago

Hello, I have KO samples for an enzyme responsible for m6A modification, and I have identified many m6A sites in these KO samples. I am currently analysing the entire sample library to quantify m6A sites in the KO samples.

Excluding KO related issues and other enzymes that could also insert m6A. Is there any possibility dorado mistakenly identifies m6A because of other methylation on A?

Run environment:

Thanks in advance

Theo-Nelson commented 1 month ago

Hi Salvobioinfo,

Not a developer of Dorado, but happy to add my two cents.

The parameters of the experiment are key when quantifying m6a sites. To achieve robust decrease in m6a, the conserved catalytic domain (DPPF from https://www.nature.com/articles/s41589-018-0184-3) for most m6a writers needs to be removed. The most common writer is METTL3. Within the knockout design, you need to specifically target the exonic region corresponding to this catalytic domain to have the intended effect. You can verify via mass spec whether the total quantity of m6a in your sample as decreased.

For our data, we observe more high-confidence signals with the newest m6a model present in WT absent from knockdown samples. All-context m6a calling is challenging - you should examine what percentage of your hits fall within homopolymer regions, rRNA, or mitochondrial regions. You can thereafter with a uniform filtering criteria compare the distributions of your KO and WT samples and see whether there are any differences.

Best of luck!

Sincerely, Theo

Salvobioinfo commented 1 month ago

Hi Theo-Nelson,

I appreciate your response greatly. The KO samples have received confirmation from both MS analysis and NGS. I value your advice on examining m6a distribution across RNA species, and I also want to include m6a by mRNA regions. We will perform this analysis as soon as possible, but overall the m6a situation remains really strange. The decrease in knockouts seems to be only slightly minimal in comparison to MS. It appears to be a common issue based on what we are reading in the modkit tools issues section.

Best, Salvo

Theo-Nelson commented 1 month ago

Dear Salvo,

A thread that really helped me over in the modkit world is this one: https://github.com/nanoporetech/modkit/issues/198

As ArtRand suggests, if you run modkit sample-probs ${modbam} --hist ${histogram_dir} --percentiles 0.1,0.8,0.85,0.9 you will get histograms (vertical sideways histograms) that you can compare to see what the unique probability threshold is.

Taking the example in the thread, the IVT (KO) vs. WT histograms look like this side-by-side for code a (the m6a code). You will get two histograms: one for m6a and one for regular A (if you just run the m6a basecalling model).

Screenshot 2024-08-01 at 9 16 17 AM

As you can see for the m6a it is the calls that are past the 99% confidence probability that are enriched in the mRNA vs. IVT comparison. This is how they arrive at the recommended filters --filter-threshold A:0.8 --mod-thresholds a:0.99

If you wish to post your histogram results, I can also advise directly here or via email tmn2126@columbia.edu

Sincerely, Theo