marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
657 stars 179 forks source link

exact behaviour of stopping and restarting an assembly. #922

Closed peterthorpe5 closed 6 years ago

peterthorpe5 commented 6 years ago

Dear Canu people,

Great tool thank you! I have a question about stopping and starting an assembly I have running. Basically it is highly heterozygous, so I am trying to "smash the haplotypes together". Our grid setup will not work with Canu in grid mode, due to the way our system takes the RAM allocation command. This wont change, so I have to run it on one server. The following command has been running for 11 days and is on job 180 out of 473 in the correct reads phase (./correctReads.sh 181 > ./correctReads.000181.out 2>&1).

QUESTION At the moment I have it running on 12 cores. If I stopped it and ran the same command asking for more cores (-maxThreads=32). Would it continue from job 180 as mentioned above AND use the new core allocation, or will it go back to the start of correcting reads?

Question 2 Could I log in to another server and run ./correctReads.sh 473 > ./correctReads.000473.out 2>&1 and work back ward to spedd this up?

command line (linux) (version: Canu snapshot v1.7 +137 changes (r8829 73d5caa1b1087b65f7853ecbebc1bb1dcbd1bc14)): /canu/Linux-amd64/bin/canu gnuplotTested=true -p Gp_newton_reduce_haplotypes_400X -d Gp_newton_reduce_haplotypes_400X genomeSize=180m -pacbio-raw 'newton.fastq.gz' -useGrid=False -maxMemory=120 -maxThreads=12 corOutCoverage=400 correctedErrorRate=0.15 corMhapSensitivity=normal

Gnuplot: I was wondering if you were aware of a GNUplot issue with the version of Canu specified above? I have reinstalled GNUplot, Canu says it can find it(when I give it the path) but wont work unless I add the command gnuplotTested=true. Our server set up has recently changed, in the past on the old set up Canu 1.6 worked fine.

Version speed Have you noticed Version 1.7 being much slower than 1.6?

cheers,

Pete

skoren commented 6 years ago
  1. With 12 cores you're probably running 1 job at a time so it will take a while. Yes, you can stop/restart and it should run more jobs concurrently.

  2. Yes, you can do that, you could do it on more than one server if you wanted to as well. You could also run with gridMode=remote. Canu will then print a command to submit and you can manually edit/submit the jobs to take advantage of your grid but you can't switch in the middle of a step easily.

  3. Your gnuplot is probably not supporting the image types canu is trying to use. The exact error message canu gives will give you the reason it didn't like gnuplot.

  4. Nope, 1.7 should be 1.6. You've asked for 400x coverage which is a lot so that coupled with the correctedErrorRate of 15% is what's causing it to run slower.

peterthorpe5 commented 6 years ago

Thank you very much for your reply. I will have a play at trying to increase cores.

Point 3). I will try and break it again and set a new issue.

4) Glad to hear!

cheers,

Pete