fork error at overlap store bucketizer stage

achurcher commented 7 years ago

Hi I am having what appears to be the same problem as in this post: https://github.com/marbl/canu/issues/136. We are trying to assemble a 1.2 GB genome with ~66x PacBio coverage and are using Canu version 1.3. We are running on a 512GB machine with 16 cores.

I have tried increasing the ovsMemory from 5-500 (ovsMemory=5-500) but the run still fails at the same place. I have pasted the tail end of the log file below and would be very happy to hear if you have any suggestions.

Thank you, Allison

----------------------------------------
-- Found 243 mhap overlap output files.
----------------------------------------
-- Starting command on Mon Nov  7 18:49:13 2016 with 2864.2 GB free disk space

    /pica/sw/apps/bioinfo/canu/1.3/milou/Linux-amd64/bin/ovStoreBuild \
     -G /scratch/8987161/run1/canu_assm3/correction/warb_assm.gkpStore \
     -O /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore \
     -M 5-500 \
     -config /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore.BUILDING/config \
     -L /scratch/8987161/run1/canu_assm3/correction/1-overlapper/ovljob.files \
    > /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore.BUILDING/config.err 2>&1

----------------------------------------
-- overlap store bucketizer attempt 0 begins with 0 finished, and 243 to compute.
----------------------------------------
-- Starting concurrent execution on Mon Nov  7 18:49:24 2016 with 2864.2 GB free disk space (243 processes; 16 concurrently)

    /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore.BUILDING/scripts/1-bucketize.sh 1 > /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore.BUILDING/logs/1-bucketize.000001.out 2>&1
. . . . . . 
bucketize.000015.out 2>&1
    /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore.BUILDING/scripts/1-bucketize.sh 16 > /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore.BUILDING/logs/1-bucketize.000016.out 2>&1
    /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore.BUILDING/scripts/1-bucketize.sh 17 > /scratch/8987161/run1/canu_assm3/correction/warb_assm.ovlStore.BUILDING/logs/1-bucketize.000017.out 2>&1
Can't fork: Resource temporarily unavailable
/var/spool/slurmd/job8987161/slurm_script: fork: retry: Resource temporarily unavailable
/var/spool/slurmd/job8987161/slurm_script: fork: retry: Resource temporarily unavailable
/var/spool/slurmd/job8987161/slurm_script: fork: retry: Resource temporarily unavailable
/var/spool/slurmd/job8987161/slurm_script: fork: retry: Resource temporarily unavailable
/var/spool/slurmd/job8987161/slurm_script: fork: Resource temporarily unavailable

brianwalenz commented 7 years ago

The good news is that this will be fixed in the next version.

For now, you can try adding '-raw' to the ovStoreBucketizer command in warb_assm.ovlStore.BUILDING/scripts/1-bucketize.sh. This should disable gzip compression. I say try because I don't know for sure if that option existed back in the 1.3 version.

Canu might also rewrite the 1-bucketize.sh file on a restart (I don't think it does). If it does, you can change src/pipelines/OverlapStore.pm at around line 258 to add the option there.

The final option is to use the slower non-parallel version of this component: ovsMethod=sequential.

How big is the correction/1-overlapper/results/ directory? Your free disk space makes me a little nervous.

achurcher commented 7 years ago

Hi Brian

Thanks for your help! I am running the assembly on a relatively large cluster and unfortunately lost the intermediate files/folders from the scratch dir on the node when the job failed the last time. So, I am not sure about the size of the correction/1-overlapper/results directory.

I have restarted the run using the 'ovsMethod=sequential' option as you suggested and this stage appears to have completed successfully and in a very reasonable amount of time.

Thanks again for the help : )

Allison

marbl / canu

fork error at overlap store bucketizer stage #290