isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
257 stars 48 forks source link

VERY slow performance with fastq.gz files #229

Closed spolson closed 1 year ago

spolson commented 1 year ago

I recently ran several different assemblies through Racon (ver 1.4.3) with very lengthy execution times. In reviewing these I notice that almost of the time was spent in "loading sequences" (nearly 12 hours in many of my runs). I decided to first decompress the fastq file with gzip (took ~25 minutes) and reran. The "loading sequences" took less than 6 minutes.

An example below, but I have numerous others with comparable issues. In this case the read file was 53GB gzipped and 126GB uncompressed (nodes had 1TB of RAM and nothing else executing):

gzipped fastq

racon -u -t 48 {HiFi.fastq.gz} {minimap2.sam} {genome.fasta} > polished.fasta

[racon::Polisher::initialize] loaded sequences 43007.108622 s

unzipped fastq

racon -u -t 48 {HiFi.fastq} {minimap2.sam} {genome.fasta} > polished.fasta

[racon::Polisher::initialize] loaded sequences 571.343952 s
rvaser commented 1 year ago

Please use a newer version (from https://github.com/lbcb-sci/racon or bioconda). The parsing was fixed from v1.4.4.

Best regards, Robert

msikic commented 1 year ago

Dear Shawn thank you for letting us know. As Robert mentioned we moved the official repo to https://github.com/lbcb-sci/racon There is a note in the header of this repo. If you have a suggestion how to make it more visible let us know

On Wed, Feb 1, 2023, 5:58 AM Shawn Polson @.***> wrote:

Just to make sure you are aware, for a user coming to the git repo it looks like version 1.4.3 is the current version. It is indicated as the "latest release" on the sidebar and is the latest release in the tags. In fact I specifically came to the site last week to make sure I had the latest release... Is cloning the git repo and building the only way to get newer versions? If so you may want to make that obvious for users.

— Reply to this email directly, view it on GitHub https://github.com/isovic/racon/issues/229#issuecomment-1411123748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALV5JEMQACWIJD24EUPH53WVGDINANCNFSM6AAAAAAULZAOEQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

spolson commented 1 year ago

Thanks for the reply and sorry for the confusion on my part.

When I come to the page the only indication it has moved seems to be in the "About" section and isn't a distinct paragraph, so it's very easy to overlook (if there's some other indication, I am missing it). Repo's that I have seen with similar situations (moved but wanted to retain continuity in original location) have put large messages at the top of the readme and have then placed the repo into Archive mode.

In any case, thanks for the work your team has done in maintaining this tool. It is much appreciated!

Screenshot 2023-01-31 at 7 35 14 PM