novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
108 stars 31 forks source link

What are the columns of ko_wt_combined.per_site_raw_feature.rrach.5mer.csv #54

Closed emanlee closed 4 years ago

emanlee commented 4 years ago

There is an example as follows. What are the columns of ko_wt_combined.per_site_raw_feature.rrach.5mer.csv like? And, how to get such csv files from the output files of Step 1 and Step 2 in EpiNano 1.2 ? Thanks a lot!

Example: python $EPINANO_HOME/Epinano_Predict.py --train ko_wt_combined.per_site_raw_feature.rrach.5mer.csv
--predict ko_wt_combined.per_site_raw_feature.rrach.5mer.csv
--accuracy_estimation --out_prefix train_and_test --columns 8,13,23 --modification_status_column 26

Huanle commented 4 years ago

Hi @emanlee , The columns are:

- Kmer: kmer (K consecutive bases)
- Window: positions in the reference where the kmer was extracted
- Ref: Reference sequenc ID
- Strand: to which strand the reads were mapped and features were extracted and computed
- Coverage: how many reads' bases were mapped at the K consecutive positions
- q[i]: i=[123...], mean quality score of all reads' bases that are mapped to the same reference position and corresponding to the the ith position in the kmer 
- mis[i]: similar to q[i] but the metric is mismatch frequency
- ins[i]: similar to q[i] but the metric is insertion fequency 
- del[i]: similar to q[i] but the metric is deletion fequency 

There are some test data and example commands showing how to run it through. Please refer to the follwoing links/locations: https://github.com/enovoa/EpiNano/blob/master/test_data/make_predictions/ https://github.com/enovoa/EpiNano/blob/master/test_data/make_predictions/run.sh

Let me know if you have more questions.

Best, Huanle