Closed zhouyiqi91 closed 6 years ago
Hello, is the 88Gb read file compressed or plain FASTA format? What is the output log before the process gets killed?
Best regards, Robert
1.The 88 Gb FASTA file is not compressed.
2.The ouput log is:
[racon::Polisher::initialize] loaded target sequences [racon::Polisher::initialize] loaded sequences [racon::Polisher::initialize] loaded overlaps /opt/gridengine/default/spool/node291/job_scripts/1903746: line 9: 29186 Killed racon -t 30 ./pre_racon/all.fasta ./pre_racon/all.paf genome.fasta
It is a bit odd that it does not fit in 120GB RAM. Even the file splitting is killed?
You can extract reads for each of the 4 parts or you can use the subsample option and use 60x coverage instead of your initial ~100x. Run the wrapper with --subsample 700000000 60
. It might yield a bit lower accuracy when compared to the full read set.
Hi, I have been trying to use racon to polish a plant genome.The assembled genome size is 711 Mb, the pacbio reads in fasta format is 88Gb and the paf file is 2.6Gb.
My machine has ~120Gb RAM. When I run the following command:
racon -t 30 all_reads.fasta all.paf genome.fasta
It consumes more than 120Gb memory and the process gets killed. So I use the wapper script:python racon_wrapper -t 30 --split 200000000 all_reads.fasta all.paf genome.fasta
The genome.fasta is splitted into 4 parts but the memory consumption is still very high. I have noticed that racon will load the reads before paf file. If I extract the reads which are mapped to part1.fasta instead of using all the reads, will it decrease the memory usage? Thank you.