Closed chrishendra93 closed 3 years ago
Hi Chris,
Sorry for the late reply.
You will need to use Slide_Variants.py
to process your csv file and generate a new file organized in kmer format.
Let me know in case if you need any further help.
Huanle
Hi @Huanle , the program has been running for a week and it has not stopped yet. I realize that you can probably shorten the running time and reduce the .tmp file by only writing to .tmp file any kmer with AC motifs in the middle since only 5-mer with this two in the middle can possibly contain the m6A modifications? I can submit a pull request if you agree with this
Hi @chrishendra93 , Thanks for the suggestion. Another way to speed it up is to separate your input files by reference contigs and/or intersection with known/prefered motifs in the reference sequences. Cheers - Huanle
Hi, @Huanle. I have the same issue @chrishendra93 mentioned. As you mentioned, I am trying to split the bam/sam input file as you recommend. Could you please explain how you splited your data or which software/script you use? My minimap result looks like the following.
@SQ SN:ENST00000497096 LN:304
@SQ SN:ENST00000516993 LN:102
@SQ SN:ENST00000461982 LN:298
@SQ SN:ENST00000362695 LN:102
@SQ SN:ENST00000384312 LN:102
@SQ SN:ENST00000410579 LN:103
@SQ SN:ENST00000583026 LN:299
@SQ SN:ENST00000516935 LN:124
@SQ SN:ENST00000580835 LN:303
@PG ID:minimap2 PN:minimap2 VN:2.17-r941 CL:minimap2 -ax map-ont --MD -t 16 /home/euijin.kwon-umw/Euijin/m6a_tool_comparision/reference/Homo_sapiens.GRCh38.cdna.ncrna_modified.fa /pi/chan.zhou-umw/SeqData/3rd_seq/xPore/HEK293T_WT-rep1/fastq_pass/HEK293T_WT-rep1.fastq.gz
dde489fa-23a5-4981-b771-ae38c97248f4 0 ENST00000361436 1 60 140S13M1I49M2D11M1I4M1D5M1D8M2I17M2D3M3D116M1I11M2D27M1D9M2D27M1D22M2I33M1D4M1D8M1I5M2D7M1D82M1I15M59S * 0 0 ACCCTATCATCATCTTATTCATATTTCATAACCCATACCATACCTACATTTTTCATAACAAACAATCCTCTGCTTCGGAGGAGGCCAAGGTGCAACTTCGGTCGTCCCGAATCCGGGTTCATCCGACACCAGCCGCCACCATGCCGCCGAAGTCTCGACCCCAACGAGATCAAAGTCGTATACCTGAGGTGCACCGGAGGTGATCGGTGCCACTCTTTGCCTGGCCCCAAGATTTCGGCCCCCTGGGTCTGTCCAAAAGTTGGTGATGACATTGCCAAGGCAACGGGTGACTGGAAGGGCCTGAGGATTACGGTGAAACTGACCGTTCAGAACAGACAGGCCCAGATTGAGGTGGTGCCTTCTGCCTCTGCCCCTGATCATCTGTCCTCAAGGAACCACCAAGAGACAGAAGAAACAGAAACATTAAACACAGTGGGAATATCACTTTGATGAGATTGTCAACATTGCTCTCGACAGATGCGGCACCGATCCATAGCCAGAGATTCCTGGAACCAACTAAAGATCCTGGGACTGCCCAGTCAGTGGGCTGTAATGTTGATGGCCGCCATCCTCATGACATCATCGATGACATCAACAGTGGTGCTGTGGAAATGCCCAGCCAGTTTAAGCACAAAGGAAAACATTTCAATAAAGGATCATCTGACAACTGGTGGAAATAAAAGA $'-/.--$$++(1%*,',)**+$&%##('*))'''(#%&$##%'&&'%'%''%%#$$##%'%)$$#%%%$*.()&<;7EA38-;:90JKI@:,$$$+,496<9<3<40*?>0(<0><C?AB@>=?C3'9EC:C:9**%/;7549C5;0*/34:*<976=@;B9??AE:=A=4JM><304&&%*1&$$14.*19;?GA5CC-*+-%0*767B;),%2'34554;;9>;>:F>=;227,*:9<7:<:DF8%((*35=?DM@9>*(4?E(9979;7679&(779872(?>=7,,430LW?JDB+444.'1%(,5-07>CCDE=C><.'*)5&:-BF78=@5;<<@D@>=9A?,55/*';:>;)1(76537'11*05.5;=+))'#&&$**&/,9541&-416A<C/678E;17@<-AA=>9;:<569;AGHEAC:6<7<AFBB:+.2-::?6@8:=9AE<:*''%%%.872%$%&//=:>=9''+%96'%24'*%%-/<5<>>?<=%&%*20.:>;@,&&::@K.B935441/3200;?H?D;?:7?744300459;8<D:>D:96;8BD:7:BA>CA;AB?92.0&'&38;852,'&-=5:96<83'$/,*4&%,+3C=4>2@7<<6(11/?3C>BA22??55,*2227<=1CCK=;;6*+0(:6?::<4<0--4/?9,%/2-*$$ NM:i:37 ms:i:766 AS:i:766 nn:i:0 tp:A:P cm:i:39 s1:i:298 s2:i:214 de:f:0.0565 MD:Z:62^AG12C2^C5^C25^TC3^AAA57A12A56^AA0A1C24^A9^AA27^T45T9^A1C2^T9T3^AG7^G97 rl:i:0
Thank you so much.
Hi Huanle,
I am just wondering about EpiNano 1.2.
I have run Epinano_Variants.py and the output csv has the following columns:
Ref,pos,base,strand,cov,q_mean,q_median,q_std,mis,ins,del
Meanwhile in your documentation, you mentioned running
python $EPINANO_HOME/Epinano_Predict.py --train ko_wt_combined.per_site_raw_feature.rrach.5mer.csv --predict ko_wt_combined.per_site_raw_feature.rrach.5mer.csv --accuracy_estimation --out_prefix train_and_test --columns 8,13,23 --modification_status_column 26
As you can see, there are not enough columns in the output files. Did I preprocess this wrongly or do you have any other models that are intended to be run with Epinano 1.2?
Thank you
Regards
Chris