Closed pabloacera closed 3 years ago
Hi @pabloacera ,
From what I can see, you were making predictions with all sites. But the model was trained on m6A motifs. Another thing is that the coverages also seem to be quite low. Can you filter your data/results on coverage (e.g., >=30) and motifs (i.e., the As from RRACH) and see if the results look more reasonable? It would be nice to have replicates and WT-KO contrasts, in order to remove false positives.
Hi @Huanle,
I'm having a similar problem, except all of the RRACH motifs are unmodified. However, I did not filter the 5mer.csv file generated with the Slide_Variants.py for coverage (e.g., >=30) since the coverage is listed for each nucleotide (e.g. 37:37:37:37:37). How can I filter it if the coverage is not presented as one number?
Thank You, Rytis
Hi @pabloacera , as @Huanle mentioned, we do not recommend to run EpiNano on single samples, despite being possible, it will be full of false positives. Additionally, when running the SVM model, only RRACH k-mers should be considered, as those are the ones that the model was trained for, so you need to subset for those. So in short, when using EpiNano-SVM you should filter the predictions by those that are not present in the KO/condition2. In addition, you should only consider RRACH sites in your analysis. Thanks!
Hi @Stakaitis , Can you let me know more details about what you did, esp. are you running with pre-trained models? IF so, which model did you use? What is the command?
Regarding filtering on coverage/depth, if there is no :
in your reference IDs, you can try command:
awk -F':' '$3>29' your_prediciton.csv > your_prediction.filt.csv
You can also try making all consecutive positions with a coverage >=30:
perl -ne '@a = split /,/,$_; @b = split /:/, $a[4]; print $_ if ($b[0]>29 && $b[1]>29 && $b[2]>29 && $b[3]>29 && $b[4]>29 || /Ref/);' your_predicitons.csv > your_prediction.filt.csv
Hope this helps. Let me know if you need further help.
@Huanle,
Thank You for the help - now I'm getting both "unm" and "mod" predictions from the _EpinanoPredict.py. Yes, I'm running with a pre-trained model "rrach.q3.mis3.del3.linear.dump"
My goal is to distinguish m6A modifications in two different cell lines (both WT condition). I want to see both differences and similarities in modification occurrence. At this stage I'm interested whether I can extract m6A modifications or not. If I understand correctly EpiNano is the right tool for that, even though, there could be high error rate in the predictions as @enovoa pointed out.
Here is the file of the commands I've used (maybe it will be useful to others whom also don't have much experiance in command line) Epinano_modification_prediction_from_model.txt
Hi @Stakaitis ,
thanks for sharing with me your commands. they look correct to me. In the test data
folder, there are two run.sh
files, which you can refer to in case you will have more, esp. unmodified contrasting samples, in the future.
Hi, I am running Epinano in a dataset with only 1 sample and the results shows that all bases are modify which is strange (almost 1M bases). I followed the instructions and generated a file
human_guppy3.1.5.MD.plus_strand.per.site.csv
with the header:Then I ran Epinano predict as I only have 1 replicate (without the KO)
$ python Epinano_Predict.py --model ./models/rrach.q3.mis3.del3.linear.dump --predict data/human_guppy3.1.5.MD.plus_strand.per.site.csv --columns 6,9,11 --out_prefix data/human_mod_predictiiton
The header output is
And it continues being all modify. Please, is there anything I am missing? Anything will be helpful. Cheers.