novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
109 stars 31 forks source link

undefined columns selected #102

Closed yuxinPenny closed 2 years ago

yuxinPenny commented 2 years ago

When I ran the DiffErr script to detect modified based, I came up with the following error:

Error in [.data.frame(input, , c("X.Ref", "pos", "position", "base", : undefined columns selected Calls: cleanup -> [ -> [.data.frame Execution halted

The command line used to run is: Rscript /home/share/yuxin/EpiNano/Epinano_DiffErr.R \ -k /home/share/yuxin/2021Fall/DATA/hm_tmp/HMEC_ALKBH5/bam/HMEC_ALKBH5_g.minus_strand.per.site.csv \ -w /home/share/yuxin/2021Fall/DATA/hm_tmp/HMEC_WT/bam/HMEC_WT_g.minus_strand.per.site.csv \ -t 10 -o HMEC_minus_DiffErr -f cov,q_mean,q_median,q_std,mis,ins,del

the head of the epinano_variant output is:

Ref,pos,base,strand,cov,q_mean,q_median,q_std,mis,ins,del

0,24738,C,-,1,11.00000,11.00000,0.00000,1.00000,0.00000,0.00000 0,24739,T,-,1,11.00000,11.00000,0.00000,1.00000,0.00000,0.00000 0,24740,G,-,1,12.00000,12.00000,0.00000,1.00000,0.00000,0.00000 0,24741,C,-,1,6.00000,6.00000,0.00000,1.00000,0.00000,0.00000 0,24742,T,-,1,6.00000,6.00000,0.00000,1.00000,0.00000,0.00000 0,24743,G,-,1,6.00000,6.00000,0.00000,1.00000,0.00000,0.00000 0,24744,A,-,1,12.00000,12.00000,0.00000,1.00000,0.00000,0.00000

Can we only use one feature to predict? PS. the #Ref number in the epinano_variant output csv file seem weird as well. Can you give some clues?

Huanle commented 2 years ago

Hi @yuxinPenny , you can use the python snippets such as E.g. Epinano_sumErr.py, Epinano_make_delta.py and Epinano_delta_sumErr.py to generate combined features and then run Epinano_DiffErr.R on the results.

btw, you should use Epinano_DiffErr.R for small RNAs. if you want to apply it to long reference sequences, you'd better do it in a sliding window manner. Otherwise, it has reduced sensitivity.