novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
112 stars 31 forks source link

RRACH requirements #131

Closed tjprins closed 1 year ago

tjprins commented 1 year ago

Hello,

I am currently having an oligo designed with m6A modifications to serve as a control strand for epinano SVM. The oligo will have an RRACA motif, where the first adenine is m6A and the second adenine is not methylated. The oligo can, at max, be 150 nucleotides, and so I would like to add in a few other RRACH motifs within the same sequence. Do you know how far apart RRACH motifs need to be in a sequence in order for the model at /EpiNano/models/rrach.q3.mis3.del3.linear.dump to be able to accurately detect them? Can they be back-to-back and be okay (i.e., RRACHRRACH)? Thanks!

Also, I'm curious if there are any plans to update the model rrach.q3.mis3.del3.linear.dump to a newer version of guppy, or if we should train our own models on newer versions or switch to pure pairwise comparisons using epinano Error.

tjprins commented 1 year ago

Update on the second point: It looks like .fast5 files generated using the latest MinKNOW software are no longer backwards compatible with older versions of Guppy (see this post on the Nanopore website). Therefore, you cannot basecall newly acquired data with Guppy 3.1.5 to use the rrach.q3.mis3.del3.linear.dump model with Epinano-SVM. You must compare each sample to a suitable control using Epinano-Error, or get an up-to-date pre-trained model using a more recent basecaller version.

enovoa commented 1 year ago

Hi @tjprins sorry for slow reply, your message was sent while Xmas holidays so then it got lost in the inbox, and only saw it now going over pending open issues. With regards to updateing the model to a newer version of guppy, I'd like to note that the guppy RNA basecalling models are pretty much identical across all guppy 3.+ and later (the DNA basecalling models have changed a lot across guppy versions, but this is not the case for RNA basecalling models). So this model should work equally well with more recent versions of Guppy. Hope this helps!

enovoa commented 1 year ago

With regards to your first question, i.e the design of the k-mer, I would not recommend to put motifs so consecutive to each other, as some works have suggested that RNA modifications affect the dwell time even 10-12 nucleotides upstream.

tjprins commented 1 year ago

Good to know. Thank you so much!