novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
109 stars 31 forks source link

How to use Epinano_Predict.py #127

Closed liuqianhn closed 1 year ago

liuqianhn commented 1 year ago

Hi , I have run Epinano_Variants to have m6a.plus_strand.per.site.csv and Epinano_Current for Intensity.collapsed.tsv.5mer.csv. I am wondering what I need to do to run Epinano_Predict.py since I have two csv files. Thanks.

enovoa commented 1 year ago

Hello, apologies for slow reply.

EpiNano_predict.py requires a model, some models are trained on 5mers whereas others are trained on per_site data. You can also train your own model. You can use pre-trained models with the option --model.

Epinano_Predict.py can predict RNA modifications on a given dataset using previously trained EpiNano models (specified with ‘--model’). In the example below, we employ a previously trained model ‘q3.mis3.del3.MODEL.linear.model.dump’ that will predict m6A modifications in RRACH k-mers on a dataset that is specified with --predict. This SVM model has been trained on RRm6ACH and RRACH k-mers produced using in vitro transcription, and the features used to train the model correspond to q3, mis3 and del3, which correspond to the per-base quality, mismatch frequency and deletion frequency of the middle position of the k-mer. It is important to note that a given model should only be used to predict modifications on the same set of k-mers that were used to train the model, i.e. if the model is trained on GGACA k- mers, it should only be used to predict m6A modifications on GGACA k-mers.

python $EPINANO_HOME/Epinano_Predict.py
      --model $EPINANO_HOME/models/q3.mis3.del3.MODEL.linear.model.dump
      --predict sample.per_site.5mer.csv
      --columns 8,13,23
      --out_prefix sample_mod_prediction