Closed BioNica closed 4 years ago
I don't think you're on the 1.9 branch, what does canu -version
report? It should have 1.9 in the name, if not you're on the tip branch instead. If you are on tip, make sure you run:
git checkout v1.9
rm -r Linux-amd64
make
and restart an assembly from scratch.
Thank you @skoren for your observation. Indeed, I was using Canu snapshot v1.8 +358 instead of v1.9 (lack of experience with git). I have now installed v1.9 following your instructions. I will start it over again, and see how far we get it now.
Dear @skoren,
As you suggested, I have installed a new snapshot of v1.9 [Canu branch v1.9 +372 changes (r9582 aed1d9016cb37eff92303bbae85a729130ebe385)] and I restarted the assembly from scratch with the same parameters. Again, it failed at the same step. I restarted it, and it still presents the same error messages.
[...]
-- Running jobs. Second attempt out of 2.
----------------------------------------
-- Starting 'cor' concurrent execution on Tue Oct 15 17:33:24 2019 with 21082.496 GB free disk space (103 processes
; 36 concurrently)
cd correction/2-correction
./correctReads.sh 1 > ./correctReads.000001.out 2>&1
[...]
./correctReads.sh 103 > ./correctReads.000103.out 2>&1
-- Finished on Tue Oct 15 17:37:03 2019 (219 seconds) with 21082.496 GB free disk space
----------------------------------------
--
-- Read correction jobs failed, tried 2 times, giving up.
-- job 2-correction/results/0001.cns FAILED.
-- job 2-correction/results/0002.cns FAILED.
[...]
-- job 2-correction/results/0101.cns FAILED.
-- job 2-correction/results/0102.cns FAILED.
-- job 2-correction/results/0103.cns FAILED.
--
ABORT:
ABORT: Canu branch v1.9 +372 changes (r9582 aed1d9016cb37eff92303bbae85a729130ebe385)
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
And the logs indicate the same problem in removing temporary files.
$ cat correction/2-correction/correctReads.000001.out
Found perl:
/usr/bin/perl
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
Found java:
/usr/bin/java
openjdk version "1.8.0_191"
Found canu:
/data04/tools/PacificBiosciences/canu-1.9/canu/Linux-amd64/bin/canu
Canu branch v1.9 +372 changes (r9582 aed1d9016cb37eff92303bbae85a729130ebe385)
Running job 1 based on command line options.
Start copy at Ter Out 15 17:33:25 -03 2019
Finished at Ter Out 15 17:35:01 -03 2019
rm: cannot remove ‘/scr/nicole-7342/PacBio.seqStore’: Directory not empty
rmdir: failed to remove ‘/scr/nicole-7342’: Directory not empty
$ cat correction/2-correction/correctReads.000050.out
[...]
Running job 50 based on command line options.
Start copy at Ter Out 15 17:34:57 -03 2019
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/blobs.0012’: No such file or directory
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/blobs.0013’: No such file or directory
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/blobs.0014’: No such file or directory
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/blobs.0015’: No such file or directory
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/blobs.0016’: No such file or directory
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/blobs.0017’: No such file or directory
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/blobs.0018’: No such file or directory
Finished at Ter Out 15 17:36:28 -03 2019
rmdir: failed to remove ‘/scr/nicole-7342’: No such file or directory
$ cat correction/2-correction/correctReads.000060.out
[...]
Running job 60 based on command line options.
Start copy at Ter Out 15 17:35:22 -03 2019
cp: target ‘/scr/nicole-7342/PacBio.seqStore/’ is not a directory
Finished at Ter Out 15 17:35:22 -03 2019
rmdir: failed to remove ‘/scr/nicole-7342’: No such file or directory
$ cat correction/2-correction/correctReads.000103.out
[...]
Running job 103 based on command line options.
Start copy at Ter Out 15 17:36:55 -03 2019
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/libraries’: No such file or directory
cp: cannot create regular file ‘/scr/nicole-7342/PacBio.seqStore/reads’: No such file or directory
cp: target ‘/scr/nicole-7342/PacBio.seqStore/’ is not a directory
Finished at Ter Out 15 17:36:55 -03 2019
rmdir: failed to remove ‘/scr/nicole-7342’: No such file or directory
As before, the script correctReads.sh removes the shared temporary directory, affecting the concurrent jobs in the same node. Maybe, if you increase the directory tree with a temporary sub-directory for each process (say 000001 to 000103) and deleting just these at the end of the process, this error could be solved.
Canu is designed assuming the scratch directory is per-process not per node so each job creates and removes the folder itself. This is documented with the command: https://canu.readthedocs.io/en/latest/parameter-reference.html#file-staging.
The problem is you're running locally because you've set useGrid=false
. Staging is designed for having multiple jobs running on different hosts which avoids concurrent access to a slow NFS system. In your case, you should run without staging (you can run Canu completely within scratch on the node or manually copy seqStore to scratch and symlink to it). Better yet, don't use useGrid=false
or it will take a while to assemble a large genome (weeks).
Thank you for the explanation @skoren . Our cluster has an heterogeneous infrastructure and is shared by several groups. To avoid monopolizing all available nodes, I have started it in the fat node alone. To avoid the stage directory and run it completely within scratch, I could enter the node with srun in an interactive shell and then start it directly, without sbatch, hoping the 3TB scratch is enough for the run. Would you recommend other method? Are there already configuration methods to limit the number of nodes canu can recruit?
Re scratch: Is there a reason you can't hold scratch through sbatch? If you have 3tb scratch it should be sufficient for the assembly to run, I'd just suggest running it in stages (-correct, -trim, -assemble) and saving the results from each at the end so you don't lose the files when sbatch releases?
Re limiting nodes, yes you can add gridEngineResourceOption="--cpus-per-task=THREADS%XX --mem-per-cpu=MEMORY"
which will limit concurrent jobs. You can also add maxThreads=XX
to limit the max cores per job, If you set maxThreads=16 gridEngineResourceOption="--cpus-per-task=THREADS%10 --mem-per-cpu=MEMORY"
you'd limit canu to 160 cores.
In my first try, canu produced more than 5 TB during correction before dying. At the moment (2-correction stage) my storage consumption is:
$ du -h --max-depth=2
3,9M ./canu-logs
0 ./canu-scripts
2,9G ./correction/0-mercounts
72M ./correction/1-overlapper
1,5T ./correction/PacBio.ovlStore
391M ./correction/2-correction
6,3G ./correction/PacBio.corStore
1,5T ./correction
19G ./PacBio.seqStore
1,6T .
If I understand your suggestion, I could copy ./PacBio.seqStore to scratch and rerun (directly in the scratch) from the step it stopped. Right? All other directories I could symlink to scratch.
PacBio.seqStore
PacBio.seqStore.err
PacBio.seqStore.ssi
PacBio.report -> /data/results/Canu_assembly/PacBio.report
canu-logs -> /data/results/Canu_assembly/canu-logs
canu-scripts -> /data/results/Canu_assembly/canu-scripts
correction -> /data/results/Canu_assembly/correction
I will try again.
Right but your previous out of space was before you added the purging and additional mhap options, it fits into 1.5 tb now and this is peak space usage, you could actually remove the ovlStore completely (replace it with an empty directory) and move everything to scratch which will certainly fit in 3tb of space.
Remove the correction.sh script in 2-results and re-launch in scratch with the same canu command but omit the staging options, that'll write the correction results to scratch directly and generate a correctedReads.fasta.gz file.
Issue is staging not purging related, staging requires unique folder names to work. Resolve either by running in scratch space on a single node or running with staging across cluster nodes.
Original issue with too much disk usage seems resolved with updated parameters and purge option (>5tb down to <1.5).
Dear @skoren , sorry for reopening this thread. I wanted to give a feedback. Following your advice, I restarted Canu from inside the compute node, running directly on scratch. It was able to finish 'cor', but then died on low read coverage, what I found really strange, given that it was 30x at the beginning.
-- In 'PacBio.seqStore', found PacBio reads:
-- Raw: 5159463
-- Corrected: 0
-- Trimmed: 0
--
-- Generating assembly 'PacBio' in '/scr/nicole/BrachyPacBio_v1.9'
--
-- Parameters:
--
-- genomeSize 2500000000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.2400 ( 24.00%)
-- obtOvlErrorRate 0.0450 ( 4.50%)
-- utgOvlErrorRate 0.0450 ( 4.50%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.0450 ( 4.50%)
-- utgErrorRate 0.0450 ( 4.50%)
-- cnsErrorRate 0.0750 ( 7.50%)
--
--
-- BEGIN CORRECTION
--
-- No change in report.
-- Set corMinCoverage=4 based on read coverage of 30.
-- Using overlaps no worse than 0.3 fraction error for correcting reads (from corErrorRate parameter).
--
-- Configuring correction jobs:
-- Jobs limited to 24 GB per job (via option corMemory).
-- Reads estimated to need at most 1.67 GB for computation.
-- Leaving 22.33 GB memory for read data.
----------------------------------------
[...]
./correctReads.sh 102 > ./correctReads.000102.out 2>&1
./correctReads.sh 103 > ./correctReads.000103.out 2>&1
-- Finished on Tue Oct 22 17:41:53 2019 (73764 seconds) with 22404.295 GB free disk space
----------------------------------------
-- Found 103 read correction output files.
-- No change in report.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
-- Found 103 read correction output files.
-- No change in report.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
--
-- Loading corrected reads into corStore and seqStore.
----------------------------------------
-- Starting command on Tue Oct 22 17:41:53 2019 with 22404.294 GB free disk space
cd correction
/data04/tools/PacificBiosciences/canu-1.9/canu/Linux-amd64/bin/loadCorrectedReads \
-S ../PacBio.seqStore \
-C ./PacBio.corStore \
-L ./2-correction/corjob.files \
> ./PacBio.loadCorrectedReads.log \
2> ./PacBio.loadCorrectedReads.err
-- Finished on Tue Oct 22 17:50:36 2019 (523 seconds) with 22359.684 GB free disk space
----------------------------------------
--
-- ERROR: Read coverage (0) is too low to be useful.
-- ERROR:
-- ERROR: This could be caused by an incorrect genomeSize or poor quality reads that could not
-- ERROR: be sufficiently corrected.
-- ERROR:
-- ERROR: You can force Canu to continue by decreasing parameter stopOnLowCoverage=10,
-- ERROR: however, the quality of corrected reads and/or contiguity of contigs will be poor.
--
Could I have lost important information by deleting the previous ovlStore?
Nope, the ovl store wouldn't affect this, it seems more like the corrected read loading failed, the 0 coverage is indeed suspicious. Do you have the PacBio.loadCorrectedReads.*
log files, if so post them here. Do you still have the 2-correction/results
folder from before you copied to scratch, what are the file sizes in there? Also what does sqStoreDumpMetaData -r 1-10 -S *.seqStore
report?
Hello @skoren ,
I still have the PacBio.loadCorrectedReads.* files. The PacBio.loadCorrectedReads.log is empty and the PacBio.loadCorrectedReads.err contains the line:
Failed to open './2-correction/corjob.files' for reading: No such file or directory
But there was a PacBio_v1.9/correction/2-correction/corjob.files
2-correction/results/0001.cns
2-correction/results/0002.cns
2-correction/results/0003.cns
2-correction/results/0004.cns
[...]
2-correction/results/0100.cns
2-correction/results/0101.cns
2-correction/results/0102.cns
2-correction/results/0103.cns
My 2-correction/results points to the storage, and has been overridden :-( Still, the files have been created again:
total 124G
1,2G 0001.cns
3,8M 0001.err
1,1G 0002.cns
3,8M 0002.err
[...]
1,2G 0101.cns
3,8M 0101.err
1,2G 0102.cns
3,8M 0102.err
336M 0103.cns
1,1M 0103.err
The contents of the files:
$ head -20 PacBio_v1.9/correction/2-correction/results/0103.err
-- Opening seqStore '../../PacBio.seqStore'.
sqCache: found 5159463 raw reads with 76446968024 bases.
-- Opening corStore '../PacBio.corStore' version 1.
-- Loading list of reads to process from './PacBio.readsToCorrect'.
read read evidence corrected
ID length reads regions
-------- ------- -------- ------------- ...
5146233 19467 16 4167-13971 ( 11970) memory act 156925952 est 700963148 act/est 22.39
5146234 14023 33 883-12248 ( 12412) memory act 3522560 est 654148580 act/est 0.54
5146235 11298 19 228-11219 ( 11054) memory act 1261568 est 628194188 act/est 0.20
5146236 8308 28 1-8281 ( 8294) memory act 462848 est 602031200 act/est 0.08
5146237 28210 36 820-8127 8553-10678 13820-28783 ( 15031) memory act 78774272 est 783101822 act/est 10.06
5146238 29644 65 0-29528 ( 29631) memory act 20582400 est 804353504 act/est 2.56
5146239 2476 7 0-0 ( 108) memory act 20480 est 546548000 act/est 0.00
5146240 20878 41 0-20810 ( 20865) memory act 1671168 est 720499694 act/est 0.23
5146241 41816 74 0-41250 ( 41803) memory act 113299456 est 917618840 act/est 12.35
5146242 19822 27 30-19587 ( 19779) memory act 233472 est 705593318 act/est 0.03
5146243 4329 6 41-907 ( 875) memory act 0 est 563229956 act/est 0.00
5146245 3857 17 0-0 ( 143) memory act 0 est 558937250 act/est 0.00
5146246 40051 51 943-36769 ( 39243) memory act 6021120 est 898384694 act/est 0.67
[...]
The .cns files are binary.
Does this information help?
That is certainly very strange. The missing file just lists the cns files. What does ls -lha /PacBio_v1.9/PacBio.seqStore
folder give?
Once we figure out the files there, you can try manually backing up and re-trying the step, it'll be something like (but don't run it until we get the files to remove from the above):
ls -lha `pwd`/PacBio_v1.9/correction/2-correction/results/*.cns |awk '{print $NF}' > `pwd`/PacBio_v1.9/correction/2-correction/corjob.files
mv PacBio_v1.9/PacBio.seqStore/version1/* PacBio_v1.9/PacBio.seqStore/
rm PacBio_v1.9/PacBio.seqStore/<extraneous files made by the previous attempt>
cd PacBio_v1.9/correction
/data04/tools/PacificBiosciences/canu-1.9/canu/Linux-amd64/bin/loadCorrectedReads \
-S ../PacBio.seqStore \
-C ./PacBio.corStore \
-L ./2-correction/corjob.files > ./PacBio.loadCorrectedReads.log \
2> ./PacBio.loadCorrectedReads.err
/data04/tools/PacificBiosciences/canu-1.9/canu/Linux-amd64/bin/sqStoreDumpMetaData -S ../PacBio.seqStore -reads -r 1-10```
Dear @skoren , sorry for taking so long to answer. I was not here last week.
Apparently, it was not using the folder I copied to the scratch, because it has not been updated:
$ ls -hla /scr/nicole/PacBio_v1.9/PacBio.seqStore
total 19G
drwxrwxr-x 2 nicole nicole 4,0K Out 11 19:33 .
drwxrwxr-x 3 nicole nicole 191 Out 22 17:41 ..
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:20 blobs.0000
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:21 blobs.0001
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:22 blobs.0002
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:22 blobs.0003
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:23 blobs.0004
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:24 blobs.0005
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:24 blobs.0006
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:25 blobs.0007
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:26 blobs.0008
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:26 blobs.0009
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:27 blobs.0010
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:28 blobs.0011
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:29 blobs.0012
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:29 blobs.0013
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:30 blobs.0014
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:31 blobs.0015
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:31 blobs.0016
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:32 blobs.0017
-rw-rw-r-- 1 nicole nicole 228M Out 11 19:32 blobs.0018
-rw-rw-r-- 1 nicole nicole 89M Out 11 19:32 errorLog
-rw-rw-r-- 1 nicole nicole 88 Out 11 19:32 info
-rw-rw-r-- 1 nicole nicole 459 Out 11 19:32 info.txt
-rw-rw-r-- 1 nicole nicole 336 Out 11 19:32 libraries
-rw-rw-r-- 1 nicole nicole 198 Out 11 19:32 libraries.txt
-rw-rw-r-- 1 nicole nicole 368 Out 11 19:32 load.dat
-rw-rw-r-- 1 nicole nicole 28M Out 11 19:33 readlengths-cor.dat
-rw-rw-r-- 1 nicole nicole 377 Out 11 19:33 readlengths-cor.gp
-rw-rw-r-- 1 nicole nicole 20K Out 11 19:33 readlengths-cor.png
-rw-rw-r-- 1 nicole nicole 240M Out 11 19:32 readNames.txt
-rw-rw-r-- 1 nicole nicole 197M Out 11 19:32 reads
Now I followed your instructions above and rebuilt the corjob.files with the absolute path. I ran the script passing the absolute path to the files on scratch, instead of the relative path:
nohup /data04/tools/PacificBiosciences/canu-1.9/canu/Linux-amd64/bin/loadCorrectedReads \
-S /scr/nicole/PacBio_v1.9/PacBio.seqStore \
-C /scr/nicole/PacBio_v1.9/correction/PacBio.corStore \
-L /scr/nicole/PacBio_v1.9/correction/2-correction/corjob.files > /scr/nicole/PacBio_v1.9/correction/PacBio.loadCorrectedReads.log \
2> /scr/nicole/PacBio_v1.9/correction/PacBio.readsToCorrect.err |& tee log_Canu_scratch_20191104.log
As result:
$ /data04/tools/PacificBiosciences/canu-1.9/canu/Linux-amd64/bin/sqStoreDumpMetaData -S PacBio.seqStore -reads -r 1-10
readID libraryID seqLen rawLen corLen clearBgn clearEnd segm byte part flags
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----- ----------- ----- ----------
1 1 0 1058 0 0 0 0 0 0 -C-
2 1 0 2255 0 0 0 0 352 0 -C-
3 1 0 2119 0 0 0 0 1000 0 -C-
4 1 2342 2411 2342 0 0 19 0 0 -C-
5 1 3249 4274 3249 0 0 19 1300 0 -C-
6 1 0 1396 0 0 0 0 3464 0 -C-
7 1 0 2118 0 0 0 0 3900 0 -C-
8 1 6329 6892 6329 0 0 19 3292 0 -C-
9 1 3574 4069 3574 0 0 19 6704 0 -C-
10 1 4571 5599 4571 0 0 19 8724 0 -C-
Is this information helpful to understand what happened?
Thanks for your help and sorry for the delay.
That looks like you have corrected reads successfully added to the store, what's the contents of /scr/nicole/PacBio_v1.9/PacBio.seqStore
now? You should be able to restart Canu with the same command you used before with the same folder and it will resume where it left off.
Hi @skoren ,
Now the folder contains files up to blobs.0050
$ ls -lha PacBio.seqStore
total 50G
drwxrwxr-x 3 nicole nicole 4,0K Nov 4 19:46 .
drwxrwxr-x 3 nicole nicole 191 Out 22 17:41 ..
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:20 blobs.0000
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:21 blobs.0001
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:22 blobs.0002
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:22 blobs.0003
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:23 blobs.0004
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:24 blobs.0005
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:24 blobs.0006
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:25 blobs.0007
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:26 blobs.0008
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:26 blobs.0009
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:27 blobs.0010
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:28 blobs.0011
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:29 blobs.0012
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:29 blobs.0013
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:30 blobs.0014
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:31 blobs.0015
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:31 blobs.0016
-rw-rw-r-- 1 nicole nicole 1,1G Out 11 19:32 blobs.0017
-rw-rw-r-- 1 nicole nicole 228M Out 11 19:32 blobs.0018
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:41 blobs.0019
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:41 blobs.0020
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:41 blobs.0021
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:41 blobs.0022
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:41 blobs.0023
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:42 blobs.0024
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:42 blobs.0025
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:42 blobs.0026
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:42 blobs.0027
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:42 blobs.0028
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:42 blobs.0029
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:43 blobs.0030
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:43 blobs.0031
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:43 blobs.0032
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:43 blobs.0033
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:43 blobs.0034
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:43 blobs.0035
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:44 blobs.0036
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:44 blobs.0037
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:44 blobs.0038
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:44 blobs.0039
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:44 blobs.0040
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:44 blobs.0041
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:45 blobs.0042
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:45 blobs.0043
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:45 blobs.0044
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:45 blobs.0045
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:45 blobs.0046
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:46 blobs.0047
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:46 blobs.0048
-rw-rw-r-- 1 nicole nicole 1,1G Nov 4 19:46 blobs.0049
-rw-rw-r-- 1 nicole nicole 8,3M Nov 4 19:46 blobs.0050
-rw-rw-r-- 1 nicole nicole 88 Nov 4 19:46 info
-rw-rw-r-- 1 nicole nicole 475 Nov 4 19:46 info.txt
-rw-rw-r-- 1 nicole nicole 336 Nov 4 19:46 libraries
-rw-rw-r-- 1 nicole nicole 0 Nov 4 19:40 log_Canu_scratch_20191104.log
-rw-rw-r-- 1 nicole nicole 197M Nov 4 19:46 reads
drwxrwxr-x 2 nicole nicole 64 Nov 4 19:46 version.001
Given that this folder is at the scratch of the compute-node, while the others are symbolic links for the shared storage, the relative paths used in the scripts were pointing to the old folder (at the storage). Now I have a link on the storage pointing to my scratch (I know, it is weird), so I assure it will write on scratch.
It is running. Let's see if we get to the end of the script now ;-)
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_191' (from 'java') with -d64 support.
-- Detected gnuplot version '5.2 patchlevel 7 ' (from 'gnuplot') and image format 'png'.
-- Detected 144 CPUs and 3024 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /cm/shared/apps/slurm/17.11.8/bin/sinfo.
-- Grid engine disabled per useGrid=false option.
--
-- (tag)Concurrency
-- (tag)Threads |
-- (tag)Memory | |
-- (tag) | | | total usage algorithm
-- ------- ------ -------- -------- ----------------- -----------------------------
-- Local: meryl 64 GB 8 CPUs x 18 jobs 1152 GB 144 CPUs (k-mer counting)
-- Local: hap 16 GB 48 CPUs x 3 jobs 48 GB 144 CPUs (read-to-haplotype assignment)
-- Local: cormhap 60 GB 16 CPUs x 9 jobs 540 GB 144 CPUs (overlap detection with mhap)
-- Local: obtovl 24 GB 16 CPUs x 9 jobs 216 GB 144 CPUs (overlap detection)
-- Local: utgovl 24 GB 16 CPUs x 9 jobs 216 GB 144 CPUs (overlap detection)
-- Local: cor 24 GB 4 CPUs x 36 jobs 864 GB 144 CPUs (read correction)
-- Local: ovb 4 GB 1 CPU x 144 jobs 576 GB 144 CPUs (overlap store bucketizer)
-- Local: ovs 32 GB 1 CPU x 94 jobs 3008 GB 94 CPUs (overlap store sorting)
-- Local: red 48 GB 8 CPUs x 18 jobs 864 GB 144 CPUs (read error detection)
-- Local: oea 8 GB 1 CPU x 144 jobs 1152 GB 144 CPUs (overlap error adjustment)
-- Local: bat 512 GB 32 CPUs x 1 job 512 GB 32 CPUs (contig construction with bogart)
-- Local: cns --- GB 8 CPUs x --- jobs --- GB --- CPUs (consensus)
-- Local: gfa 64 GB 32 CPUs x 1 job 64 GB 32 CPUs (GFA alignment and processing)
--
-- In 'PacBio.seqStore', found PacBio reads:
-- Raw: 5159463
-- Corrected: 3694042
-- Trimmed: 0
--
-- Generating assembly 'PacBio' in '/scr/nicole/PacBio_v1.9'
--
-- Parameters:
--
-- genomeSize 2500000000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.2400 ( 24.00%)
-- obtOvlErrorRate 0.0450 ( 4.50%)
-- utgOvlErrorRate 0.0450 ( 4.50%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.0450 ( 4.50%)
-- utgErrorRate 0.0450 ( 4.50%)
-- cnsErrorRate 0.0750 ( 7.50%)
----------------------------------------
-- Starting command on Tue Nov 5 10:33:19 2019 with 2820.017 GB free disk space
cd .
/data04/tools/PacificBiosciences/canu-1.9/canu/Linux-amd64/bin/sqStoreDumpFASTQ \
-corrected \
-S ./PacBio.seqStore \
-o ./PacBio.correctedReads.gz \
-fasta \
-nolibname \
> PacBio.correctedReads.fasta.err 2>&1
-- Finished on Tue Nov 5 10:51:27 2019 (1088 seconds) with 2801.114 GB free disk space
----------------------------------------
--
-- Corrected reads saved in 'PacBio.correctedReads.fasta.gz'.
-- No change in report.
-- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
--
-- Bye.
[Update] It worked! 19G PacBio.correctedReads.fasta.gz
Canu -correct finished. Now it is running -trim
### executing: canu -trim -p PacBio -d PacBio_v1.9 genomeSize=2.5g -pacbio-corrected PacBio_v1.9/PacBio.correctedReads.fasta.gz -s specFile-canu.txt
OK, I'd suggest either keeping everything on scratch or everything on the NFS system. The I/O load I expect in trimming will be lower.
Dear @skoren , thanks for your help! After the adjustments, canu -trim and canu -assemble have ended successfully and very fast!
Dear developers, (sorry for the long post)
I'm using canu to assembly a 2.5 G mammal genome. It is running on a cluster with slurm, but at the time I'm using a single fat node (144 CPUs and 3024 gigabytes of memory). I'm still in the correction step.
After running out of space several times (with canu release v1.8) and dying during 'cormhap' and 'ovB', I followed the suggestions from Skoren (https://github.com/marbl/canu/issues/1465) , installed what I believe would be the version 1.9 (actually cloning the git version - Canu snapshot v1.8 +358, because the latest release distributed as .tar.gz is 1.8), and prepared the spec file as follows:
Now, it was able to finish 'cormhap', 'ovB' and 'ovS' using markedly less space at the storage, in less than 3 days. Than it failed at the 'cor' step, after 2 tries.
At this time the error was not related to storage quota. The only error messages I could find were recorded at the correctReads output and correction/2-correction/results/
files correction/2-correction/results/0001.err up to 0067.err
while files from correction/2-correction/results/0068.err to 0103.err say they cannot even find the store:
files: correction/2-correction/correctReads.000001.out (up to correctReads.000069.out)
files from correctReads.000070.out to correctReads.000103.out (last) differ slightly, including errors at copy as these:
I'm not sure if this is enough to diagnose what could be going wrong, but I feel that removing exactly the stageDirectory, would not be the desired action, given that it is shared by all other concurrent processes.
I have found exactly this action coded at the script
correction/2-correction/correctReads.sh
Maybe this happens to be a problem of running all concurrent processes in the same compute node. According to this code, each process will attempt to create, write at, and remove the same single directory. If I had one process for node, I would have one stageDirectory for process, and it would be created and removed at each step without affecting other processes.
One possible work around could be including the number of the partition (local variable $jobid) in the name of the directory as follows:
I edited the script and ran canu correct again. I commented the remove lines to be able to verify the temporary files. Again, I found a similar error at correction/2-correction/results/0001.err
I looked at the files at this directory, and no file or directory named reads-rawu exists.
It may be a typo somewhere else in the code.
I'm stuck here and I do not know what to try next. It may be something that is notting todo with all that I have tried.
If you have ever seen something similar, please let me know. Thank you in advance!