isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
268 stars 48 forks source link

killed 12 Error #93

Open imneuro opened 5 years ago

imneuro commented 5 years ago

Hi there,

I have new pacbio long-reads assembled into a 2.2G genome. I want to use racon to do polishing. When I run racon1.3.1 using following command

racon -t 34 LT2.merged.filt2k.fasta LT2.2kfasta.gfa.paf LT2.contigs.fasta LT2.racon1.fasta

I get STDERR as

[racon::Polisher::initialize] loaded target sequences
bash: line 1:    12 Killed                  racon -t 34 LT2.merged.filt2k.fasta LT2.2kfasta.gfa.paf LT2.contigs.fasta  LT2.racon1.fasta

Regardless that I use fasta format for long-reads file and the 12 killed error, I realize that the new Pacbio reads don't have per base quality value. I have two questions: 1) is the new pacbio sequel reads still good to be used as input for racon polishing? 2) Anything I can do to avoid the "12 Killed" error? Your input is very valuable for us.

Best, Xin

rvaser commented 5 years ago

Hi Xin, yes, you can use the sequel data or any other which does not have quality values. The error you encounter indicates that you run out of memory. How large is the file containing sequencing data, the paf file, and how much RAM does your machine have?

Best regards, Robert

imneuro commented 5 years ago

Hi Robert,

I requested 238GB of memory for the job. And the size of input file are as following: 143G LT2.merged.filt2k.fasta 5.6G LT2.2kfasta.gfa.paf 2.2G LT2.contigs.fasta

How big the memory you would recommend me to use? Is there a general rule on estimate the memory useage?

Best, Xin

rvaser commented 5 years ago

The memory requirements are equal to the sum of the input data plus some overhead on encapsulating classes. It is odd that 238GB was not enough for your run. Are you sure your job got the requested amount? Try requesting 300GB or try running the same amount but run racon_wrapper where the split size should be equal to your longest contig multiplied with ~1.1.

Best regards, Robert

rvaser commented 5 years ago

Nvm, I saw that you are using v1.3.1. :)

imneuro commented 5 years ago

Thanks Robert. One more thought, what's the requirement of minimal local disk? In another word, how much disk the /tmp would have to be to successfully run the racon or racon_wrapper?

rvaser commented 5 years ago

It will need two times the size of the target sequences (in this case the draft assembly which comes to around 4GB).

imneuro commented 5 years ago

I run into the same problem as #81. And just as you expected, I used docker container and run it on AWS. Thank you for creating such a wonder tool, I hope this get improved soon.