smith-chem-wisc / Spritz

Software for RNA-Seq analysis to create sample-specific proteoform databases from RNA-Seq data
https://smith-chem-wisc.github.io/Spritz/
MIT License
7 stars 11 forks source link

Worflow processing time #205

Closed mwfoster closed 3 years ago

mwfoster commented 3 years ago

Hi,

I recently ran the Spritz workflow on SRR11706043 from the GUI. The total workflow took >4 days. I'm wondering if there is any optimization that can be done to improve the speed. I believe it's using WSL2 but not sure it's taking full advantages of system resources. Can you provide any guidance here?

Thanks,

Matt Foster

acesnik commented 3 years ago

Hi Matt,

Thanks for update. I'm sorry to hear it's taking so long. I would expect it to take about 1 day.

Could you provide information about the system you're running the workflow on?

If you could also send the *.benchmark files, I might be able to get a better idea of what's taking the longest.

Best regards,

Anthony

mwfoster commented 3 years ago

Hi Anthony,

I am using the GUI, and I did check the variant + isoform analysis, although a variant analysis alone in parallel is equally slow. There are 36 threads specified in the workflow. The computer has 128 GB RAM, but I haven't seen more than 16 GB utilized.

I have attached the .benchmark files. benchmarks.zip

Thanks,

Matt

acesnik commented 3 years ago

Interesting, thanks for sharing the benchmark files! The workflow is actually moving impressively fast. The step that took 4 days was actually the downloading the sequence read archive... I'm not sure what to recommend for speeding up that step, but I'll think about it more tomorrow.

acesnik commented 3 years ago

I'll see if I can implement this in the next release. https://www.biostars.org/p/264524/#300448

Aspera is much faster than other protocols for these types of raw data downloads, from past experience.

mwfoster commented 3 years ago

Thanks. So I guess it would run much faster from a local .fastq file? Regards, Matt

From: Anthony notifications@github.com Sent: Friday, February 26, 2021 10:03 AM To: smith-chem-wisc/Spritz Spritz@noreply.github.com Cc: Matthew Foster, Ph.D. matthew.foster@duke.edu; Author author@noreply.github.com Subject: Re: [smith-chem-wisc/Spritz] Worflow processing time (#205)

I'll see if I can implement this in the next release. https://www.biostars.org/p/264524/#300448https://urldefense.com/v3/__https:/www.biostars.org/p/264524/*300448__;Iw!!OToaGQ!4ZwczzjwShq-1t0Zvy7jb6Er2_9INMI_xFORNhWDS_emBWGcF_segwgfhG1GwjR1$

Aspera is much faster than other protocols for these types of raw data downloads, from past experience.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/smith-chem-wisc/Spritz/issues/205*issuecomment-786700583__;Iw!!OToaGQ!4ZwczzjwShq-1t0Zvy7jb6Er2_9INMI_xFORNhWDS_emBWGcF_segwgfhLl0t26p$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/ADXFNPU5C27XQWYTXXVWWCLTA6ZZTANCNFSM4YHAYBCQ__;!!OToaGQ!4ZwczzjwShq-1t0Zvy7jb6Er2_9INMI_xFORNhWDS_emBWGcF_segwgfhPJPiezk$.

acesnik commented 3 years ago

No problem. Yes, I believe so, given that the FASTQ download is taking so long.

mwfoster commented 3 years ago

I discovered that you can download the fastq from ebi (https://www.ebi.ac.uk/ena/browser/view/SRR11706043). Total processing time was <12 hours starting with the fastq. Matt

acesnik commented 3 years ago

This should be addressed in this PR: https://github.com/smith-chem-wisc/Spritz/pull/207.

Downloading with aspera does turn out to be much quicker.

acesnik commented 3 years ago

I just looked into the difference between using prefetch (now implemented) and fasterq-dump (before), and it takes 4 mins rather than 42 mins for the SRR that I use for testing. That's quite the difference! 😄