novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
109 stars 31 forks source link

about RRACH sites #117

Closed xieyy46 closed 1 year ago

xieyy46 commented 2 years ago

Hi developer!As your paper and github show Epinano trained the SVM model use 5-mer that contained only one A, should I discard the kmer in the results file with more than two As ?I saw in main page of Epinano github that you remind us to retain only RRACH sites in the result file, but some RRACH sites like GGACA contain more than one A. I am so confused. Thank you! image image image

enovoa commented 2 years ago

Hi @xieyy46 , the model is trained with sites that do not contain two As in the same k-mer, because otherwise both would be modified (which is not the case in biological situations, as per what we know). For this reason, we recommend to exclude k-mers with 2As, as those have not been explicitely trained or tested in the SVM model. However that being said, we did test the performance of the model in all RRACH sites, which contain, as you mention kmers such as GGACA. Since at the end the final model only considers information from the middle position of the kmer (q3.mis3.del3), in our paper we included GGACA k-mers in our analysis of the performance of the algorithm.

Hope that clarified your doubts.

Finally I'd like to note that the SVM on single sample is not recommended, and that EpiNano-Error overperforms EpiNano-SVM in our hands.

Thanks, Eva

xieyy46 commented 2 years ago

Hi Eva! Thank you for you help! I also want know your recommended thresholds for SVM single sample and SVM delta model.

Huanle commented 2 years ago

Hi @xieyy46 ,

I'd recommend you refer to https://www.nature.com/articles/s41467-019-11713-9 for thresholds srtting up.

Hope this helps.