novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
108 stars 31 forks source link

Missing sample.per_site.5mer.csv file after Epinano_Variants.py #93

Closed izl2 closed 3 years ago

izl2 commented 3 years ago

Hi!

I recently attempted to run Epinano_Variants.py, using the ko.bam, wt.bam, and ref.fa files provided in test_data. I was able to generate the ko.plus_strand.per.site.csv and wt.plus_strand.per.site.csv output in the same directory as the .bam and .fa files. However, I did not see any per_site.5mer.csv files, which I believe I need to use as part of EpiNano-SVM. Wondering if you had any idea where I might find these files?

Thanks so much!

Huanle commented 3 years ago

Hi @izl2 , you need to run misc/Slide_Variants.py in order to generate the file where variants are organized on kmer basis.

acarmas1 commented 2 years ago

Hi Huanle,

This happened to me, too; epinano_variants.py generated two files: 1) minus_strand.per.site.csv 2) plus_strand.per.site.csv

So I was wondering, to run Slide_Variants.py, should I do it with the plus strand, right?

Something like this: python Slide_Variants.py plus_strand.per.site.csv 5

Huanle commented 2 years ago

Hi @acarmas1 ,

The help message tells how to run it:

$ python misc/Slide_Variants.py 
python Slide_Variants.py per_site_var kmer_length
please provide 1) variants table from Epinano_Variants and 2) windown size(integer)

You can combine plus and minus strands data after this.

acarmas1 commented 2 years ago

Hi Huanle, thanks for replying

Just to make sure, should I run python Slide_Variants.py per_site_var kmer_length for both files? the plus and minus, and then combine them using 'cat' for example or what do you mean?

Huanle commented 2 years ago

Hi @acarmas1 , Both cat -->slide and slide --> cat will work.

kwonej0617 commented 1 year ago

@Huanle @acarmas1

When I run Epinano_Variants.py, I only got the positive strand of output, e.g. wt.plus_strand.per.site.csv, not the minus strand. Do you know why I haven't got the minus strand output?

Also, after running Epinano_Variants.py, I have run Slide_Variants.py, but it takes so much time. I am wondering if there is a threads or processor options in the function.

Finally, If I don't use slide_variants.py and directly run Epinano_Predict.py after Epinano_variants.py, which model should I use, and how to set --columns with that model?

I am looking forward to hearing from you.

Thank you!

Huanle commented 1 year ago

Hi @acarmas1 ,

When I run Epinano_Variants.py, I only got the positive strand of output, e.g. wt.plus_strand.per.site.csv, not the minus strand. Do you know why I haven't got the minus strand output?

I guess you ran it in transcriptome mode?

Also, after running Epinano_Variants.py, I have run Slide_Variants.py, but it takes so much time. I am wondering if there is a threads or processor options in the function. I will find time to improve the codes.

Finally, If I don't use slide_variants.py and directly run Epinano_Predict.py after Epinano_variants.py, which model should I use, and how to set --columns with that model? If so, you will need to re-train models and reformat the input format.

Hope this helps. I will inform you once I finish improving the codes.

Best, Huanle

acarmas1 commented 1 year ago

Hi, yes I remembered I run it in transcriptome mode.

Huanle commented 1 year ago

Hi @acarmas1 ,

When I run Epinano_Variants.py, I only got the positive strand of output, e.g. wt.plus_strand.per.site.csv, not the minus strand. Do you know why I haven't got the minus strand output?

I guess you ran it in transcriptome mode?

Also, after running Epinano_Variants.py, I have run Slide_Variants.py, but it takes so much time. I am wondering if there is a threads or processor options in the function. I will find time to improve the codes.

Finally, If I don't use slide_variants.py and directly run Epinano_Predict.py after Epinano_variants.py, which model should I use, and how to set --columns with that model? If so, you will need to re-train models and reformat the input format.

Hope this helps. I will inform you once I finish improving the codes.

Best, Huanle

Hi @kwonej0617

kwonej0617 commented 1 year ago

@acarmas1 Thank you for your reply! @Huanle Yes, please let me know if you improve your code! Thank you so much. Meanwhile, I wanted to try to split my large bam file into multiple bam files and try to run slide_variants.py. Do you have any software you recommend or you used to split the bam file? Thank you.

Huanle commented 1 year ago

@acarmas1 Thank you for your reply! @Huanle Yes, please let me know if you improve your code! Thank you so much. Meanwhile, I wanted to try to split my large bam file into multiple bam files and try to run slide_variants.py. Do you have any software you recommend or you used to split the bam file? Thank you.

@kwonej0617 ,

You can give it a try with bamtools. bamtools split -in file.bam -reference would do the job for you. If you are farmiliar with pysam, a few lines of Python codes should also help do the same task.

kwonej0617 commented 1 year ago

Thank you so much for your advice! Also, I am looking forward to hearing from you about improving the code in slide_variants mode. I would be very appreciate it if you could let me know! Thank you so much.