novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
108 stars 31 forks source link

Running Epinano models on m6A modifcations #88

Closed chrishendra93 closed 3 years ago

chrishendra93 commented 3 years ago

Hi Huanle,

I am just wondering about EpiNano 1.2.

I have run Epinano_Variants.py and the output csv has the following columns:

Ref,pos,base,strand,cov,q_mean,q_median,q_std,mis,ins,del

Meanwhile in your documentation, you mentioned running

python $EPINANO_HOME/Epinano_Predict.py --train ko_wt_combined.per_site_raw_feature.rrach.5mer.csv --predict ko_wt_combined.per_site_raw_feature.rrach.5mer.csv --accuracy_estimation --out_prefix train_and_test --columns 8,13,23 --modification_status_column 26

As you can see, there are not enough columns in the output files. Did I preprocess this wrongly or do you have any other models that are intended to be run with Epinano 1.2?

Thank you

Regards

Chris

Huanle commented 3 years ago

Hi Chris,

Sorry for the late reply. You will need to use Slide_Variants.py to process your csv file and generate a new file organized in kmer format. Let me know in case if you need any further help. Huanle

chrishendra93 commented 3 years ago

Hi @Huanle , the program has been running for a week and it has not stopped yet. I realize that you can probably shorten the running time and reduce the .tmp file by only writing to .tmp file any kmer with AC motifs in the middle since only 5-mer with this two in the middle can possibly contain the m6A modifications? I can submit a pull request if you agree with this

Huanle commented 3 years ago

Hi @chrishendra93 , Thanks for the suggestion. Another way to speed it up is to separate your input files by reference contigs and/or intersection with known/prefered motifs in the reference sequences. Cheers - Huanle

kwonej0617 commented 1 year ago

Hi, @Huanle. I have the same issue @chrishendra93 mentioned. As you mentioned, I am trying to split the bam/sam input file as you recommend. Could you please explain how you splited your data or which software/script you use? My minimap result looks like the following.

@SQ     SN:ENST00000497096      LN:304
@SQ     SN:ENST00000516993      LN:102
@SQ     SN:ENST00000461982      LN:298
@SQ     SN:ENST00000362695      LN:102
@SQ     SN:ENST00000384312      LN:102
@SQ     SN:ENST00000410579      LN:103
@SQ     SN:ENST00000583026      LN:299
@SQ     SN:ENST00000516935      LN:124
@SQ     SN:ENST00000580835      LN:303
@PG     ID:minimap2     PN:minimap2     VN:2.17-r941    CL:minimap2 -ax map-ont --MD -t 16 /home/euijin.kwon-umw/Euijin/m6a_tool_comparision/reference/Homo_sapiens.GRCh38.cdna.ncrna_modified.fa /pi/chan.zhou-umw/SeqData/3rd_seq/xPore/HEK293T_WT-rep1/fastq_pass/HEK293T_WT-rep1.fastq.gz
dde489fa-23a5-4981-b771-ae38c97248f4    0       ENST00000361436 1       60      140S13M1I49M2D11M1I4M1D5M1D8M2I17M2D3M3D116M1I11M2D27M1D9M2D27M1D22M2I33M1D4M1D8M1I5M2D7M1D82M1I15M59S  *       0       0       ACCCTATCATCATCTTATTCATATTTCATAACCCATACCATACCTACATTTTTCATAACAAACAATCCTCTGCTTCGGAGGAGGCCAAGGTGCAACTTCGGTCGTCCCGAATCCGGGTTCATCCGACACCAGCCGCCACCATGCCGCCGAAGTCTCGACCCCAACGAGATCAAAGTCGTATACCTGAGGTGCACCGGAGGTGATCGGTGCCACTCTTTGCCTGGCCCCAAGATTTCGGCCCCCTGGGTCTGTCCAAAAGTTGGTGATGACATTGCCAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCGTTCAGAACAGACAGGCCCAGATTGAGGTGGTGCCTTCTGCCTCTGCCCCTGATCATCTGTCCTCAAGGAACCACCAAGAGACAGAAGAAACAGAAACATTAAACACAGTGGGAATATCACTTTGATGAGATTGTCAACATTGCTCTCGACAGATGCGGCACCGATCCATAGCCAGAGATTCCTGGAACCAACTAAAGATCCTGGGACTGCCCAGTCAGTGGGCTGTAATGTTGATGGCCGCCATCCTCATGACATCATCGATGACATCAACAGTGGTGCTGTGGAAATGCCCAGCCAGTTTAAGCACAAAGGAAAACATTTCAATAAAGGATCATCTGACAACTGGTGGAAATAAAAGA    $'-/.--$$++(1%*,',)**+$&%##('*))'''(#%&$##%'&&'%'%''%%#$$##%'%)$$#%%%$*.()&<;7EA38-;:90JKI@:,$$$+,496<9<3<40*?>0(<0><C?AB@>=?C3'9EC:C:9**%/;7549C5;0*/34:*<976=@;B9??AE:=A=4JM><304&&%*1&$$14.*19;?GA5CC-*+-%0*767B;),%2'34554;;9>;>:F>=;227,*:9<7:<:DF8%((*35=?DM@9>*(4?E(9979;7679&(779872(?>=7,,430LW?JDB+444.'1%(,5-07>CCDE=C><.'*)5&:-BF78=@5;<<@D@>=9A?,55/*';:>;)1(76537'11*05.5;=+))'#&&$**&/,9541&-416A<C/678E;17@<-AA=>9;:<569;AGHEAC:6<7<AFBB:+.2-::?6@8:=9AE<:*''%%%.872%$%&//=:>=9''+%96'%24'*%%-/<5<>>?<=%&%*20.:>;@,&&::@K.B935441/3200;?H?D;?:7?744300459;8<D:>D:96;8BD:7:BA>CA;AB?92.0&'&38;852,'&-=5:96<83'$/,*4&%,+3C=4>2@7<<6(11/?3C>BA22??55,*2227<=1CCK=;;6*+0(:6?::<4<0--4/?9,%/2-*$$    NM:i:37 ms:i:766        AS:i:766        nn:i:0  tp:A:P  cm:i:39 s1:i:298        s2:i:214        de:f:0.0565     MD:Z:62^AG12C2^C5^C25^TC3^AAA57A12A56^AA0A1C24^A9^AA27^T45T9^A1C2^T9T3^AG7^G97  rl:i:0

Thank you so much.