novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
112 stars 31 forks source link

Threads or processor options for slide_variants function #136

Closed kwonej0617 closed 1 year ago

kwonej0617 commented 1 year ago

Hi, Thank you for providing a useful tool. I have run Epinano with my dataset. I found the step of Slide_Variants took too much time to get the result. I was wondering if there is a threads or processor option that help reduce processing time and get the output faster.

Thank you!

Huanle commented 1 year ago

Hi @kwonej0617 ,

As far as i can remeber, this step is quite fast. Can you tell me the size of your input file and the relevant command line?

Best, Huanle

kwonej0617 commented 1 year ago

@Huanle Thank you for your response.

The size of bam files that took a long time to process Slide_Variants is 937M and 1.4G. They took around ~96 hours.

Thank you.

Huanle commented 1 year ago

thanks @kwonej0617 , do you mind sharing with me the input file to slide_variants?

kwonej0617 commented 1 year ago

Sure. You can use the following link to download the data. Please let me know if you are unable to access it. https://drive.google.com/drive/folders/1PlxuD0YLRN-U6tU4mHqagUGNyJ3OrBkM?usp=drive_link

The file is in gzipped format, but when I run slide_variants. I used the decompressed format.

Thank you so much for your help.

Huanle commented 1 year ago

I have requested to download the data but have not been approved yet. Can you approve my request so that i can move forward?

Cheers - Huanle

On Mon, Jun 26, 2023 at 8:06 AM kwonej0617 @.***> wrote:

Sure. You can use the following link to download the data. Please let me know if you are unable to access it.

https://umassmed-my.sharepoint.com/:f:/r/personal/euijin_kwon_umassmed_edu/Documents/[Epinano](https://umassmed-my.sharepoint.com/:f:/r/personal/euijin_kwon_umassmed_edu/Documents/Epinano?csf=1&web=1&e=ZtXvlA)?csf=1&web=1&e=ZtXvlA

The file is in gzipped format, but when I run slide_variants. I used the decompressed format.

Thank you so much for your help.

— Reply to this email directly, view it on GitHub https://github.com/novoalab/EpiNano/issues/136#issuecomment-1606347601, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG57ELQ6PPUCVY3IFGQ74TXNDHB7ANCNFSM6AAAAAAY7HR37U . You are receiving this because you were mentioned.Message ID: @.***>

--

kwonej0617 commented 1 year ago

@Huanle I changed the access permisison if you use the link below. Could you please try it again? https://drive.google.com/file/d/1jwl56Q1WhhXUuRFvhO8x3Apmf9fHGa0D/view?usp=sharing

Thank you so much for your help!

doshirLV commented 1 year ago

Hello EpiNano developers,

Thank you for the assistance, Raj

Huanle commented 1 year ago

Hi Raj,

Sorry for the late reply. I have been occupied by other tasks. Now i am working on it.

Best regards, Huanle

On Fri, Aug 18, 2023 at 5:48 AM Raj Doshi @.***> wrote:

Hello EpiNano developers,

  • I am also wondering about this. Processing the bam file with $EPINANO_HOME/Epinano_Variants.py is fairly fast (within a day) but then running slide variants to get kmers for the plus and minus strand sample_strand.per.site.csv files is taking about 2 days each.
  • The input .csv's are about 5-600 MB each.
  • Is there any way to speed this up by providing more cores?
  • Please, kindly let me know at your earliest convenience.

Thank you for the assistance, Raj

— Reply to this email directly, view it on GitHub https://github.com/novoalab/EpiNano/issues/136#issuecomment-1683023292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG57EJZNUKSMOB4EYPT3C3XV2GSFANCNFSM6AAAAAAY7HR37U . You are receiving this because you were mentioned.Message ID: @.***>

--

Huanle commented 1 year ago

@kwonej0617 I have committed a new version of slide_variants.py. Once it is admitted by the owner of this repo, you can give it a go.

enovoa commented 1 year ago

Can you please test the new slide_variants script from EpiNano 1.2.3? Thanks

doshirLV commented 1 year ago

Hello Huanle,

Thank you for your help. The new version has currently also been running for 2 days. I started Slide_Variants.py on Wednesday morning and it is still running as of Friday afternoon (PT). I used the same command as before. Let me know if there is an option I can use to specify the amount of threads/cores. Or if there is anything else I can do to speed this up.

Much appreciated, Raj

Huanle commented 1 year ago

Hi @doshirLV ,

I do not think the script has been successfully committed to github. Therefore, I attached it here. Please change it a python script before using it. Let me know if you encounter any issues.

Best, Huanle

Slide_Variants.txt

doshirLV commented 1 year ago

Dear Huanle,

This new version is much faster. It completes in less than an hour. The new Slide_Variants.py script should be committed to the EpiNano github page since it severely improves the tool.

One note about the script, though:

Also a question about this version:

With gratitude, Raj

Huanle commented 1 year ago

Hi @doshirLV ,

Your note is right. It's to replace slide not var.

Regrading the non-consecutive-sites file, it contains information that can not be used to construct windows/kmers, because it does not have information associated with sites/positions right next to it. This means you can ignore/delete the file or comment out the script that generated it.

I will ask Eva to commit the relevant script.

Best, Huanle