nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
488 stars 59 forks source link

dorado separates bascecalling and modified base detection into two processes? #507

Closed xieyy46 closed 4 months ago

xieyy46 commented 9 months ago

Hi dorado team! Cheers for your excellent work! But I am so confused about the difference between Guppy and dorado, specifically the following questions below: 1, Are the modification detection models used in dorado trained by remora? image

2, We are still accustomed to using Guppy (I am using Version 6.5.7) for basecalling. We know that Guppy allows specifying modification detection models, for example, using the parameter '-c dna_r9.4.1_450bps_modbases_5mc_cg_fast.cfg.' I would like to know if these models are trained by Taiyaki?

3, Can we specify models trained with Remora when using Guppy?

4, We know that the modification detection models trained by Taiyaki detect modifications concurrently with basecalling. Is Remora models designed to perform basecalling first and then conduct modification detection?

Thank you for your time! I will appreciate any possible help!

vellamike commented 9 months ago

1, Are the modification detection models used in dorado trained by remora?

Yes, they are.

2, We are still accustomed to using Guppy (I am using Version 6.5.7) for basecalling. We know that Guppy allows specifying modification detection models, for example, using the parameter '-c dna_r9.4.1_450bps_modbases_5mc_cg_fast.cfg.' I would like to know if these models are trained by Taiyaki?

These models are also trained by Remora

3, Can we specify models trained with Remora when using Guppy?

While old models will run, we are no longer support running newer modbase models with Dorado, please also note that modbase calling with Dorado is much faster than Remora.

4, We know that the modification detection models trained by Taiyaki detect modifications concurrently with basecalling. Is Remora models designed to perform basecalling first and then conduct modification detection?

Modbase calling in Dorado runs concurrently with basecalling, please refer to the Dorado documentation on this for information on how to run it.

malton-ont commented 9 months ago

Modbase calling in Dorado runs concurrently with basecalling, please refer to the Dorado documentation on this for information on how to run it.

Just to clarify this - modbase calling with dorado is performed in the same invocation as basecalling, but for a given read the basecalling step is performed first and then modbase step - the basecall results are therefore independent of the modbase model. I believe this is different to the old Taiyaki-trained models, where basecalling and modbase calling occurred within the same model.

xieyy46 commented 9 months ago

Hi vellamike and malton-ont, Thank you for your help! By the way, two more question. I noticed that we can specify chunksize when using dorado for basecalling. 1, In my understanding, shouldn't a well-trained model typically have a fixed input requirement? Why is it possible to adjust the chunk size? I'm unsure if my interpretation is accurate. In Dorado's basecalling model, there are no fully connected layers, allowing for variable input sizes, is that correct? 2, Another question is about the output of the basecalling model. I noticed that the stride of DNA basecalling model is 5. Does that mean the length of the layer just before the CTC decoder is approximately 1/5 of the initial input?

HalfPhoton commented 4 months ago

@xieyy46, apologies for this thread being lost.

  1. I'm not sure I understand your first question - the input (chunksize) and output (scores) tensor sizes scale together depending on the specific model. The beam search decode can solve any size scores to give your final basecalls.
  2. Yes that's right.