novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
110 stars 31 forks source link

Does the method suitable for other species? #10

Closed tengfeixiaozhu closed 5 years ago

tengfeixiaozhu commented 5 years ago

Hi huanle, I am very interested in the software developed by your group. I want to use the software to predict the methylation status for my own data, however, I do not have the training dataset. Could I use your model for my data produced by direct RNA sequencing without PCR and reverse transcription.

Nanopolish software which developed for predicting the DNA methylation status was suitable for all the other species. I have an expectation for this matter.

Looking foward for your reply.

tengfeixiaozhu commented 5 years ago

@Huanle hi huanle, could you help me answer the question?

Huanle commented 5 years ago

Hi @tengfeixiaozhu sorry for the late response. Since Epinano employs the decreased sequencing qualities associated with modified RNA bases, in theory, it should be able to extended to different species. But as of this moment, we are still investigating it. Thanks and i look forward to any further questions you might have.

enovoa commented 5 years ago

Hi @tengfeixiaozhu , the current model included in the EpiNano release has been trained with 100% modified and 0% modified sequences. If you would like to detect m6A modified sites with high stoichiometry, this model should work fine. However, if you believe that the sites you are interested in are lowly modified in terms of stoichiometry (no matter what species), I would recommend to retrain the SVM with different proportion of modified and unmodified sequences. We are still working on improving the accuracy of the algorithm for in vivo scenarios, as the variability of stoichiometry in the sites does affect the performance of the algorithm.

enovoa commented 5 years ago

Correction: the EpiNano release now includes an SVM trained with 100% modified sequences as well as a second one trained with 20% modified sequences.