isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
269 stars 49 forks source link

Cryptic Error in Racon Short Read Operation #169

Closed nhartwic closed 4 years ago

nhartwic commented 4 years ago

I'm currently attempting to use racon as a short read polisher. My current workflow looks like...

cat Pennycress_1014_BWA_001_S1_R1_001.fastq.gz Pennycress_1014_BWA_001_S1_R2_001.fastq.gz | gunzip | sed 's/ /_/' > Penny1014.v2.fasta.concat_reads.fastq
minimap2  -x sr -t 16 Penny1014.v2.fasta Penny1014.v2.fasta.concat_reads.fastq > Penny1014.v2.fasta.rs_mm2.paf
racon  -t 16 Penny1014.v2.fasta.concat_reads.fastq Penny1014.v2.fasta.rs_mm2.paf Penny1014.v2.fasta > Penny1014.v2.rs1.fasta

Mapping seems to go fine. I've modified the read ids using the sed command to eliminate duplicate read ids, but racon still doesn't seem to like the paf/fastq/fasta file. I'm not getting any kind of interpretable error here though. I'm just getting a "4333 Killed". Does this have some known meaning? Is there something wrong with the workflow I'm using?

rvaser commented 4 years ago

Hi, the commands seem to be okay, you probably run out of memory. How large is the fastq file and how much RAM do you have?

Best regards, Robert

nhartwic commented 4 years ago

Thank you for the prompt reply. Compressed, the reads are about 7 gigs. Uncompressed, about 32 gigs. Paf file is another 21 gigs. Assembly itself is about 400 megabytes. I was attempting to run this on a machine with 64 gigs of ram. Given racon loads everything into memory, that probably isn't sufficient. How much ram would you recommend for files of this size? would you expect 128 gigs to be sufficient or do I need to step up to 256?

rvaser commented 4 years ago

128Gb will be sufficient, I suppose the memory consumption will be around 70Gb.

nhartwic commented 4 years ago

Thanks again for the assistance. I'll update later after I attempt the next run with more memory.

nhartwic commented 4 years ago

Well.. looks like 128 gigs isn't enough either. I can verify that it is hitting the memory limit though I don't think I'd have expected it to with these files. I'll retry with 256 gigs I guess unless you think something else is going wrong.

nhartwic commented 4 years ago

Alright. Looks like it finally executed. Max memory usage ended up being about 145 gigs. Seems like racon is unable to take full advantage of the 32 allocated cores for whatever reason. Maybe disk was slow.

Just for reference, here are some plots showing resource usage of my polishing workflow that involved 3 consecutive rounds of short read alignment and racon consensus followed by an execution of busco.

https://salk-tm-logs.s3.amazonaws.com/Y3RKc9FTJ8X9.metrics/metrics.html

Thanks again for the help.

rvaser commented 4 years ago

Thanks for the follow-up and the plots. We also noticed that the parallel efficiency with short reads needs improvement.

nhartwic commented 4 years ago

No problem. Meant to close this with my last comment but apparently forgot to. Doing so now.