novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
109 stars 31 forks source link

Fail to reproduce your results #134

Closed rania-o closed 1 year ago

rania-o commented 1 year ago

Hello,

I am looking to reproduce the results of your article. So I ran Epinano SVM on your published data (mod and unmod Rep1). Here are the commands I used (for both mod and unmod samples):

python3 ~/softs/EpiNano/Epinano_Variants.py -R ../ref/curlcake.fasta -b ../mapping/mod_rep1_extract_nanopolish_sorted.bam -s ~/softs/EpiNano/misc/sam2tsv.jar

python3 ~/softs/EpiNano/misc/Slide_Variants.py mod_rep1_extract_nanopolish_sorted.plus_strand.per.site.csv  5

python3 ~/softs/EpiNano/Epinano_Predict.py --model ~/softs/EpiNano/models/rrach.q3.mis3.del3.linear.dump --predict mod_rep1_extract_nanopolish_sorted.plus_strand.per.site.5mer.csv --columns 8,13,23 --out_prefix mod_rep1_prediction_RRACH_b3

The problem I have is that Epinano SVM gives me as result about 9000 modified positions for both samples (mod and unmod). Is this normal?

Thank you very much. Rania

enovoa commented 1 year ago

Hi @rania-o, From the details above, it seems you are using Epinano version 1.2 on curlcake data, and the Nat Comm paper(which I assume is the article you are referring to) was using EpiNano 1.0.

Also, please note that the predictions of EpiNano 1.2 can be filtered in many different ways, we leave that option to the user, precisely because not "one option fits all purposes". EpiNano 1.2 was benchmarked and tested on rRNA sequences, not on curlcake datasets (https://pubmed.ncbi.nlm.nih.gov/34085237/).

Thanks, Eva

rania-o commented 1 year ago

Hi @enovoa

Thanks and sorry for the late reply. But I don't understand why Epinano SVM 1.2 doesn't work with the curlcake data, since it's a model trained to detect m6A modifications ,normally it should work on any data ?

Rania

enovoa commented 1 year ago

Hi @rania-o I am not saying that Epinano SVM 1.2 doesn't work with the curlcake data, I am saying that in your first message you said that you are trying to reproduce what we did in the paper, and this is not what we did in the paper because we were using Epinano 1.0. And in the second publication, we benchmarked Epinano 1.2 on rRNAs, not on curlcakes. On the other hand, I would like to note that as explained in the Nat Comm paper (see methods section) a very important step is to filter the results, not just use them straight away. Please see pseudocode included in the paper. Thanks, Eva

rania-o commented 1 year ago

Hi @enovoa

Thank you for your reply. I do understand your point. But I am still confused by the fact that Epinano SVM 1.2 produced many false positives when I used it on unmodified curclcakes. On the other hand filtering results is quite difficult in my case, because I do denovo modification detection, and I already did a lot of tests with known modifications but at this point the tools have a large proportion of false positives which makes the detection more difficult ... Thanks,

Rania