rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
536 stars 132 forks source link

Memory problem...Speicherzugriffsfehler (Speicherabzug geschrieben) #169

Open xlinxlin opened 5 years ago

xlinxlin commented 5 years ago

Hi,

I ran hybrid assembly and under the "Assembling contigs and long reads with miniasm" process I got the error message "Speicherzugriffsfehler (Speicherabzug geschrieben)", I think in English it should be something wrong with the memory access.

The last output is:

Aligning long reads to graph using minimap 
Saving to /home/sepeter/Harddisk2/ID40 Sequenzierungen Nanopore+Illumina/UnicyclerAssambly/miniasm_assembly/01_assembly_reads.fastq: 
  409 short-read contigs 
  334,546 long reads 
Finding overlaps with minimap... Speicherzugriffsfehler (Speicherabzug geschrieben)

I ran Unicycler on my Ubuntu 16.04 laptop with 64 GB ram and about 150 GB free disk space (sequenceing data are short1.fastq , short2.fastq and Nanopore.fastq together about 15 GB).

Can you help me? Thank you.

thsyd commented 5 years ago

What is the organism you are trying to assemble? Unicycler is built for prokaryotes. You could try to subsample the long reads. (you have a lot of data). Either by using Filtlong, by random subsampling or by using Canu to correct (standard settings select about 20-40% of the "best" reads for correction). https://doi.org/10.1101/530824

For the short reads, anything >100x coverage is just excessive and often leads to poorer assemblies (more, shorter contigs). So you could also subsample the short reads to <100x coverage.

xlinxlin commented 5 years ago

Hi @thsyd , thanks for your advice! I reduced the size of long reads with adapter trimming and filtering on quality and read length, and now it seems to work, but I still got issues under the "Polishing assembly with Pilon" process.

Polishing assembly with Pilon (2019-02-20 18:08:20)
---------------------------------------------------
    Unicycler now conducts multiple rounds of Pilon in an attempt to repair any remaining small-scale errors with the assembly.

Aligning reads to find appropriate insert size range...
Insert size 1st percentile:  195
Insert size 99th percentile: 777

Pilon polish round 1
Unable to polish assembly using Pilon: Pilon encountered an error:
Pilon version 1.23 Mon Nov 26 16:04:05 2018 -0500
Genome: 1_polish_input.fasta
Fixing snps, indels
Input genome size: 6919690
Processing 1:1-3960192
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.broadinstitute.pilon.PileUpRegion.$anonfun$new$1(PileUpRegion.scala:30)
    at org.broadinstitute.pilon.PileUpRegion$$Lambda$42/1072769947.apply$mcVI$sp(Unknown Source)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:155)
    at org.broadinstitute.pilon.PileUpRegion.<init>(PileUpRegion.scala:30)
    at org.broadinstitute.pilon.GenomeRegion.initializePileUps(GenomeRegion.scala:148)
    at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$4(GenomeFile.scala:111)
    at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$4$adapted(GenomeFile.scala:109)
    at org.broadinstitute.pilon.GenomeFile$$Lambda$39/1131645570.apply(Unknown Source)
    at scala.collection.Iterator.foreach(Iterator.scala:937)
    at scala.collection.Iterator.foreach$(Iterator.scala:937)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
    at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:970)
    at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:49)
    at scala.collection.parallel.Task$$Lambda$40/933112439.apply$mcV$sp(Unknown Source)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
    at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:63)
    at scala.collection.parallel.Task.tryLeaf(Tasks.scala:52)
    at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:46)
    at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:967)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:149)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:145)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:436)
    at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Because of the "OutOfMemoryError" error message, do you think it still caused by not enough RAM issue? But even though I still finished the run and got the assembly. In the attachment you will find the log file. unicycler.log

thsyd commented 5 years ago

It seems like it might be a memory problem (allthough you have 64GB?). I'm really not an expert at this. Pershaps someone else would have some better suggestions.

have you looked at the advances usage for Unicycler? You could also try to run without pilon polishing in unicycler and then polish it yourself with pilon (and perhaps try racon with illumina reads. - some users get quite good results).

from https://github.com/broadinstitute/pilon/wiki/Requirements-&-Usage

8GB or more memory to allocate to the JVM. The amount of memory required depends on the genome, the read data, and how many fixes Pilon needs to make. Generally, bacterial genomes with ~200x of Illumina coverage will require at least 8GB, though 16GB is recommended. Larger genomes will require more memory to process; exactly how much is very data-dependent, but as a rule of thumb, try to allocate 1GB per megabase of input genome to be processed.

Pilon polishing: These options control the final assembly polish using Pilon at the end of the Unicycler pipeline. --no_pilon Do not use Pilon to polish the final assembly (default: Pilon is used) --bowtie2_path BOWTIE2_PATH Path to the bowtie2 executable (default: bowtie2) --bowtie2_build_path BOWTIE2_BUILD_PATH Path to the bowtie2_build executable (default: bowtie2-build) --samtools_path SAMTOOLS_PATH Path to the samtools executable (default: samtools) --pilon_path PILON_PATH Path to a Pilon executable or the Pilon Java archive file (default: pilon) --java_path JAVA_PATH Path to the java executable (default: java) --min_polish_size MIN_POLISH_SIZE Contigs shorter than this value (bp) will not be polished using Pilon (default: 10000)`

xlinxlin commented 5 years ago

Hi @thsyd , thank you again for your quick answer, it helps me a lot! I will go through the posts and documentation. Have a nice day!

dswan commented 5 years ago

The Pilon issue (and fix) is discussed in #147

xlinxlin commented 5 years ago

The Pilon issue (and fix) is discussed in #147

I see, thank you!