nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
495 stars 59 forks source link

Methylation patterns #668

Closed Angelluigiguarnizo closed 4 months ago

Angelluigiguarnizo commented 7 months ago

Dear authors,

I have a question regarding the detection of methylations. I am not quite sure how it detects gold methylations, i.e. is it based on already existing methylation patterns (motifs) and then I can only detect those motifs, or can I detect new motifs based on the 4 existing methylation types. And, if I knew that motif but it is not in the training model, could I use it?

Thank you very much

ArtRand commented 7 months ago

Hello @Angelluigiguarnizo,

Could you explain in detail what you mean by "gold methylations"? For DNA we have modified base models for 5mC/5hmC at any cytosine and 6mA at adenine. Meaning that all Cs or As will get a base modification prediction. For RNA we currently have a m6A modification model that works in DRACH motifs (for the center 'A'). Additionally, there is a 4mC research model. Maybe if you tell me a little more about your use case I can help you decide which model to use.

Angelluigiguarnizo commented 7 months ago

Hello again,

Sorry if I didn't manage to explain myself better in my previous messageI did not mean gold, but all. I'll try to explain it better.

In our group, we aim to detect methylations in various bacterial genomes.

I've learned that there are several programs like mcaller, megalodom, and nanopolish that, using training models, can detect different methylations (by sequencing native DNA), but they require knowledge preciois of the motif in which methylation occurs.

Others like nanodisco can detect new methylation motifs by sequencing both native DNA and WGA. However, this method requires sequencing twice, increasing the cost.

My question is whether Dorado can detect new methylation motifs without sequencing the WGA. I'm specifically interested in adenines. Additionally, through PacBio, I know the motif that is methylated, so I'm unsure if I can also inform Dorado about the motif I'm seeking.

I hope this clarification helps. I'm new to the nanopore world, so thank you for your understanding.

Thanks

ArtRand commented 7 months ago

Hello @Angelluigiguarnizo,

The adenine (6mA) model and cytosine (5hmC/5mC, 4mC/5mC) models are all "all-context" models. This means that you don't need to know the motif a priori, all As or Cs will have a base modification probability. N.b. that the 4mC model is a "research model" provided through rerio.

angelluigi commented 6 months ago

Hello ArtRnad,

Thanks about your response.

I have got a new question. I did the sequencing, using flow cell R9 version ¿Could i get 6mA, using dna_R10? I have not seen any dna_R9 that says it can detect 6mA.

Thaks

ArtRand commented 6 months ago

Hello @angelluigi,

You should not try and use the dna_R10 models with R9 data, the sensor is fundamentally different. There is a R9 flip-flop all-context model on rerio that will call 6mA (and 5mC). The caveat is these models are somewhat old at this point so the performance will not be as high as the current offerings.