weir12 / DeepEdit

DeepEdit: single-molecule detection and phasing of A-to-I RNA editing events using Nanopore direct RNA sequencing
MIT License
13 stars 1 forks source link

Empty target_site_features.csv file #7

Closed acarmas1 closed 1 year ago

acarmas1 commented 1 year ago

Hello,

I've been running the test data set for DeepEdit, when I completed the 1.Reads_extract.sh script. I realized that my target_site_reads.txt is different than the file located here in GitHub (DeepEdit/Getting_Started/target_site_reads.txt)

Here is my code: image

Here is my output: image

And here is the file posted in Github: image

Even though, both look different I went ahead and run 2.feature_extract.py by using this code: image

And when it finished, the code generated an empty target_site_features.csv file.

Please, I was wondering if you could help me to troubleshoot it, I want to run DeepEdit first with your testdata set and then try it with my own data to detect A-to-I RNA editing sites in my directRNA sequencing reads.

I'm looking forward to hearing from you.

Best, Camila

LongxianChen commented 1 year ago

Hi Camila,

I guess this may due to that I compressed the nanopore_reads folder into a nanopore_reads.tar.gz file when I uploaded them on Github. I have re-uploaded the related files, you may re-clone the files and try again.

Best, Longxian Chen

acarmas1 commented 1 year ago

Hello,

I re-cloned the DeepEdit directory, and by using this code:

image

I got the same result, my target_site file still looks the same:

image

LongxianChen commented 1 year ago

I test the code on another server again and it works well. So, you may debug on this following two aspects:

  1. Do you have installed samtools in your server ?
  2. Does the nanopore.sam file you used in your code is the one that I provided ?
acarmas1 commented 1 year ago

Hello,

Thank you, I didn't load samtools before. So now I loaded it and I was able to generate the same target_sites.txt file.

However, when running the 2.feature_extract.py code.

I got the same target_site_features.csv, but with this variation. image I have 6 columns filled with nan values. I checked, and I'm using python 3.8.3.

I uploaded my outputs in this link: https://1drv.ms/f/s!AokqkR3muxL0jvohT7RWoCTBOV3Ncg?e=w25uhy

weir12 commented 1 year ago

Hi,

Please take a look at the parameter options for tombo re-squiggle in our other repository:


https://github.com/weir12/DENA/blob/2c584c9a22f2903a1c44abe9a734fef5e9d158c8/README.md?plain=1#L116

In summary, you need to include the--include-event-stdev option in the tombo re-squiggle stage to calculate the standard deviation for each base.

Best regards, Liang Ou

acarmas1 commented 1 year ago

It worked. I was able to run your test dataset.

Thank you so much for your help.

Now I'm going to try it with my data.

acarmas1 commented 1 year ago

Hello,

I'm currently trying with my own data.

I'm performing the basecalling. However, I have a question regarding this step: guppy_basecaller -i DeepEdit/Getting_Started/nanopore_reads/ -s DeepEdit/Getting_Started/nanopore_reads/ --flowcell FLO-MIN106 --kit SQK-RNA001 --cpu_threads_per_caller 1 --qscore_filtering --fast5_out

What version of guppy did you use for your basecalling? Cause when I submitted my script to perform the basecalling I got this error. image It looks like qscore filtering is not longer a parameter for the guppy I have, and when I deleted it from my code, I was able to run it. However, I'm worried that this parameter is essential for further steps.

weir12 commented 1 year ago

Hi, Guppy (version 4.0.15) That is clearly described in the Methods section of the paper. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02921-0 Best regards, Liang Ou

acarmas1 commented 1 year ago

Perfect, thank you.