nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads
https://nanoporetech.com/
Other
389 stars 120 forks source link

Wiki on experimental strategies for training models with extended alphabet #242

Open Kirk3gaard opened 2 years ago

Kirk3gaard commented 2 years ago

Hi

It would be cool if there was a wiki section that would include the entire approach (including the wetlab part) of training new models. e.g. what would be the best experimental design to train models for predicting the incorporation of alternative nucleotides?

  1. Sequence PCR products with one nucleotide fully substituted
  2. Sequence PCR products with "normal" nucleotides
  3. Sequence PCR products with a mix of normal and substituted?

Best regards Rasmus

mauriciolp commented 2 years ago

Hey Rasmus,

I am also currently facing these questions. I guess that it would be useful and nice from ONT side to give some tips about this. Although I think that for the wetlab part you might need to refer to what have been published, for example:

On my case I have the sequence PCR data from a DNA sample with extended alphabet, and I am tweaking Bonito to train with this data. I have found that some adjustments in the code were necessary to make it work. Hopefully I can share more about it once this work progresses.