Wiki on experimental strategies for training models with extended alphabet

Hey Rasmus,

I am also currently facing these questions. I guess that it would be useful and nice from ONT side to give some tips about this. Although I think that for the wetlab part you might need to refer to what have been published, for example:

Kimoto, Michiko, Si Hui Gabriella Soh, and Ichiro Hirao. 2020. “Sanger Gap Sequencing for Genetic Alphabet Expansion of DNA.” Chembiochem: A European Journal of Chemical Biology 21 (16): 2287–96.
Yamashige, Rie, Michiko Kimoto, Yusuke Takezawa, Akira Sato, Tsuneo Mitsui, Shigeyuki Yokoyama, and Ichiro Hirao. 2012. “Highly Specific Unnatural Base Pair Systems as a Third Base Pair for PCR Amplification.” Nucleic Acids Research 40 (6): 2793–2806.

On my case I have the sequence PCR data from a DNA sample with extended alphabet, and I am tweaking Bonito to train with this data. I have found that some adjustments in the code were necessary to make it work. Hopefully I can share more about it once this work progresses.

nanoporetech / bonito

Wiki on experimental strategies for training models with extended alphabet #242