Closed JeremyQuo closed 3 years ago
Hi @JeremyQuo ,
I am awfully for the late reply.
The RRACH motif can be easily selected with regular expression. /[AG][AG]AC[ACT]/ on forward strand or /[TGA]GT[TC][TC]/ on reverse strand.
1.
The rrach.q3.mis3.del3.linear.dump
file can be trained using relevant features denoted by the name, aka, the quality score, mismatch frequency, deltetion frequency for the 3rd/middle base in the 5mer. You can follow the steps in the train_models
folder to train the model.
2. You need to drop non-RRACH results if the model you used was trained with data containing only RRACH motifs. You can filter it out either before or after making predictions.It does not matter.
Hope this helps and please let me know if you need further help or clarification.
Dear prof.
thanks for your reply
I get it
Do you use rep1 and rep2 in git to train your model?
I filtered the reach mode but the number of it is not that good. so I wanna the number of rows you used to train your model
Best regards Zhihao
在 2021年10月15日,下午3:13,WHUANLEE @.***> 写道:
Hi @JeremyQuo ,
I am awfully for the late reply.
The RRACH motif can be easily selected with regular expression. /[AG][AG]AC[ACT]/ on forward strand or /[TGA]GT[TC][TC]/ on reverse strand.
The rrach.q3.mis3.del3.linear.dump file can be trained using relevant features denoted by the name, aka, the quality score, mismatch frequency, deltetion frequency for the 3rd/middle base in the 5mer. You can follow the steps in the train_models folder to train the model.
You need to drop non-RRACH results if the model you used was trained with data containing only RRACH motifs. You can filter it out either before or after making predictions.It does not matter.
Hope this helps and please let me know if you need further help or clarification.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Hi @JeremyQuo ,
The models included in this repo were trained with published data (doi: 10.1038/s41467-019-11713-9.) The example data for users to play with is only a small subset of that published one. You can download the whole dataset provided in the manuscript and perform training using the whole dataset. Please let me know if you need more help. All the Best!
Mant thanks for your answering.
Actually, I trained a new algorithm and wanna test my algorithms on your data. It works well on example data, so I wanna get all rrach data. I tried to rerun run.sh to generate 5mer.csv from your raw data(https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP174366) SAMN10640338/SAMN10640337 and it comes some new issues.
Here is my command
_bin/guppy_basecaller -c rna_r9.4.1_70bps_hac.cfg --compress_fastq -i ./fast5/ -r -s ./mod_fastq/ --fast5_out -x 'auto' cat /.fast.gz > mod_fastq.gz gizp -d mod.fastq.gz minimap2 --MD -t 6 -ax map-ont cc.fasta mod.fastq | samtools view -hbS -F 3844 - | samtools sort -@ 6 -o mod.bam samtools index mod.bam python ../../Epinano_Variants.py -R cc.fasta -b mod.bam -n 6 -T t -s ../../misc/sam2tsv.jar python ../../misc/Slide_Variants.py ko.plusstrand.per.site.csv 5
The rows of result rows is no more than 10k, which is less than your example data. I wanna know what's the problem of my cmds.
Besides,can you send me your entire 5mer.csv of unmod/mod or tell me the exact number of rows about your RRACH 5-mers.
Many thanks.
Hi @JeremyQuo , did you combine both mod and unmod data after you got the features organized in 5mer format? I am out of the office so I cannot get the data you are asking for.
Nop, but I think it will be 20k rows after combination,which is same as sample data in git. However, you answered me that sample data in git a subset,so I wanna obtain more data to train and test. Or it means it is all 5mer features of this raw data?
Hi @JeremyQuo you to have both mod
and unm
to do training.
In this paper, I see that you used 5-mers of RRACH to do lots of statistics work. But I dont see any operations in your code about RRACH. Here are my questions about it.
In test_data/make_predictions/run.sh
How do you get your rrach.q3.mis3.del3.linear.dump in line 46? I think there may something missing. what are your train data and did you extact all RRACH data? And if so, how many 5-mers of RRACH you used or the ratio of RRACH in all 5-mer?
I want know if the non-RRACH need to be droped before before run Epinano_Predict.py. Because it seems your cmds and codes do not do that thing or I missed it somewhere. Can you tell me the part you deal with RRACH please?