Closed yaoxkkkkk closed 1 year ago
The corMhapSensitivity option only applies to the built-in correction which you're not using so that is why it made no difference.
The issue is your corrected reads don't really look corrected. There's no histogram peak in the k-mers that you'd expect after correction. There are also almost no overlaps being found at 4.5% error, which is why you're ending up with no coverage:
-- OUTPUT READS:
-- ------------
-- 27223 reads 51357655 bases (trimmed reads output)
-- 9 reads 18796 bases (reads with no change, kept as is)
-- 1275855 reads 19820693833 bases (reads with no overlaps, deleted)
-- 107986 reads 2738001899 bases (reads with short trimmed length, deleted)
I'd suggest running with the built-in canu correction and see what results you obtain.
Thank you for your kindly reply! I have tried to use Canu correction to correct and assembly the raw Pacbio data, but I got similar results, the command I use is:
canu useGrid=false \
-p tf -d tf-pacbio \
genomeSize=164.4m \
corMhapSensitivity=normal \
-pacbio /dssg/home/acct-jiang.lu/jiang.lu/Tf/Genome_data/SRR23272336_1.fastq
And the log is showing below:
-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction and consensus use:
-- Šošic M, Šikic M.
-- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
-- Bioinformatics. 2017 May 1;33(9):1394-1395.
-- http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
-- Berlin K, et al.
-- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
-- Nat Biotechnol. 2015 Jun;33(6):623-30.
-- http://doi.org/10.1038/nbt.3238
--
-- Myers EW, et al.
-- A Whole-Genome Assembly of Drosophila.
-- Science. 2000 Mar 24;287(5461):2196-204.
-- http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
-- Chin CS, et al.
-- Phased diploid genome assembly with single-molecule real-time sequencing.
-- Nat Methods. 2016 Dec;13(12):1050-1054.
-- http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
-- Chin CS, et al.
-- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
-- Nat Methods. 2013 Jun;10(6):563-9
-- http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_312' (from 'java') with -d64 support.
-- Detected gnuplot version '5.2 patchlevel 4 ' (from 'gnuplot') and image format 'png'.
--
-- Detected 40 CPUs and 320000 gigabytes of memory on the local machine.
--
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Slurm disabled by useGrid=false
--
-- Local machine mode enabled; grid support not detected or not allowed.
--
-- (tag)Concurrency
-- (tag)Threads |
-- (tag)Memory | |
-- (tag) | | | total usage algorithm
-- ------- ---------- -------- -------- -------------------- -----------------------------
-- Local: meryl 24.000 GB 8 CPUs x 5 jobs 120.000 GB 40 CPUs (k-mer counting)
-- Local: hap 12.000 GB 20 CPUs x 2 jobs 24.000 GB 40 CPUs (read-to-haplotype assignment)
-- Local: cormhap 13.000 GB 10 CPUs x 4 jobs 52.000 GB 40 CPUs (overlap detection with mhap)
-- Local: obtovl 8.000 GB 8 CPUs x 5 jobs 40.000 GB 40 CPUs (overlap detection)
-- Local: utgovl 8.000 GB 8 CPUs x 5 jobs 40.000 GB 40 CPUs (overlap detection)
-- Local: cor -.--- GB 4 CPUs x - jobs -.--- GB - CPUs (read correction)
-- Local: ovb 4.000 GB 1 CPU x 40 jobs 160.000 GB 40 CPUs (overlap store bucketizer)
-- Local: ovs 8.000 GB 1 CPU x 40 jobs 320.000 GB 40 CPUs (overlap store sorting)
-- Local: red 16.000 GB 5 CPUs x 8 jobs 128.000 GB 40 CPUs (read error detection)
-- Local: oea 8.000 GB 1 CPU x 40 jobs 320.000 GB 40 CPUs (overlap error adjustment)
-- Local: bat 64.000 GB 8 CPUs x 1 job 64.000 GB 8 CPUs (contig construction with bogart)
-- Local: cns -.--- GB 8 CPUs x - jobs -.--- GB - CPUs (consensus)
--
-- Found untrimmed raw PacBio CLR reads in the input files.
--
-- Generating assembly 'tf' in '/dssg/home/acct-jiang.lu/jiang.lu/Tf/way1/tf-pacbio':
-- genomeSize:
-- 164400000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.2400 ( 24.00%)
-- obtOvlErrorRate 0.0450 ( 4.50%)
-- utgOvlErrorRate 0.0450 ( 4.50%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.2500 ( 25.00%)
-- obtErrorRate 0.0450 ( 4.50%)
-- utgErrorRate 0.0450 ( 4.50%)
-- cnsErrorRate 0.0750 ( 7.50%)
--
-- Stages to run:
-- correct raw reads.
-- trim corrected reads.
-- assemble corrected and trimmed reads.
--
--
-- BEGIN CORRECTION
----------------------------------------
-- Starting command on Sun Sep 3 11:13:40 2023 with 799284.812 GB free disk space
cd .
./tf.seqStore.sh \
> ./tf.seqStore.err 2>&1
-- Finished on Sun Sep 3 11:18:31 2023 (291 seconds) with 799398.953 GB free disk space
----------------------------------------
--
-- In sequence store './tf.seqStore':
-- Found 1411139 reads.
-- Found 23273378703 bases (141.56 times coverage).
-- Histogram of raw reads:
--
-- G=23273378703 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 52545 36156 2327338168 || 1000-7389 261130|------------------------------
-- 00020 40702 86884 4654688563 || 7390-13779 554721|---------------------------------------------------------------
-- 00030 32780 150830 6982041779 || 13780-20169 237248|---------------------------
-- 00040 26594 229783 9309375917 || 20170-26559 127710|---------------
-- 00050 21449 327346 11636705582 || 26560-32949 81198|----------
-- 00060 17036 449211 13964032384 || 32950-39339 53539|-------
-- 00070 13663 602685 16291366224 || 39340-45729 35069|----
-- 00080 11543 788708 18618708039 || 45730-52119 23158|---
-- 00090 9377 1010164 20946041426 || 52120-58509 15135|--
-- 00100 1000 1411138 23273378703 || 58510-64899 9343|--
-- 001.000x 1411139 23273378703 || 64900-71289 5513|-
-- || 71290-77679 3186|-
-- || 77680-84069 1769|-
-- || 84070-90459 1090|-
-- || 90460-96849 607|-
-- || 96850-103239 329|-
-- || 103240-109629 178|-
-- || 109630-116019 100|-
-- || 116020-122409 50|-
-- || 122410-128799 25|-
-- || 128800-135189 15|-
-- || 135190-141579 8|-
-- || 141580-147969 1|-
-- || 147970-154359 0|
-- || 154360-160749 3|-
-- || 160750-167139 2|-
-- || 167140-173529 1|-
-- || 173530-179919 2|-
-- || 179920-186309 4|-
-- || 186310-192699 1|-
-- || 192700-199089 0|
-- || 199090-205479 0|
-- || 205480-211869 1|-
-- || 211870-218259 0|
-- || 218260-224649 2|-
-- || 224650-231039 0|
-- || 231040-237429 0|
-- || 237430-243819 0|
-- || 243820-250209 0|
-- || 250210-256599 0|
-- || 256600-262989 0|
-- || 262990-269379 0|
-- || 269380-275769 0|
-- || 275770-282159 0|
-- || 282160-288549 0|
-- || 288550-294939 0|
-- || 294940-301329 0|
-- || 301330-307719 0|
-- || 307720-314109 0|
-- || 314110-320499 1|-
--
----------------------------------------
-- Starting command on Sun Sep 3 11:18:33 2023 with 799398.921 GB free disk space
cd correction/0-mercounts
./meryl-configure.sh \
> ./meryl-configure.err 2>&1
-- Finished on Sun Sep 3 11:18:34 2023 (one second) with 799398.921 GB free disk space
----------------------------------------
-- segments memory batches
-- -------- -------- -------
-- 01 14.00 GB 3
-- 02 13.50 GB 2
-- 04 10.91 GB 2
-- 06 7.46 GB 2
-- 08 5.71 GB 2
-- 12 3.98 GB 2
-- 16 3.12 GB 2
-- 20 2.48 GB 2
-- 24 2.28 GB 2
-- 32 1.65 GB 2
-- 40 1.50 GB 2
-- 48 1.25 GB 2
-- 56 1.07 GB 2
-- 64 0.94 GB 2
-- 96 0.62 GB 2
--
-- For 1411139 reads with 23273378703 bases, limit to 232 batches.
-- Will count kmers using 02 jobs, each using 15 GB and 8 threads.
--
-- Finished stage 'merylConfigure', reset canuIteration.
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sun Sep 3 11:18:34 2023 with 799398.921 GB free disk space (2 processes; 5 concurrently)
cd correction/0-mercounts
./meryl-count.sh 1 > ./meryl-count.000001.out 2>&1
./meryl-count.sh 2 > ./meryl-count.000002.out 2>&1
-- Finished on Sun Sep 3 11:26:36 2023 (482 seconds) with 798748.828 GB free disk space
----------------------------------------
-- Found 2 Kmer counting (meryl) outputs.
-- Finished stage 'cor-merylCountCheck', reset canuIteration.
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sun Sep 3 11:26:36 2023 with 798748.828 GB free disk space (1 processes; 5 concurrently)
cd correction/0-mercounts
./meryl-process.sh 1 > ./meryl-process.000001.out 2>&1
-- Finished on Sun Sep 3 11:27:14 2023 (38 seconds) with 798976.625 GB free disk space
----------------------------------------
-- Meryl finished successfully. Kmer frequency histogram:
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 201552521 ************************* 0.1017 0.0174
-- 3- 4 486987755 ************************************************************* 0.2241 0.0489
-- 5- 7 553940866 ********************************************************************** 0.4580 0.1386
-- 8- 11 347900759 ******************************************* 0.6867 0.2730
-- 12- 16 170999161 ********************* 0.8265 0.3956
-- 17- 22 84385157 ********** 0.8982 0.4857
-- 23- 29 45001864 ***** 0.9353 0.5495
-- 30- 37 26723373 *** 0.9558 0.5963
-- 38- 46 17563534 ** 0.9683 0.6330
-- 47- 56 11614248 * 0.9767 0.6636
-- 57- 67 7685728 0.9822 0.6884
-- 68- 79 5297520 0.9859 0.7083
-- 80- 92 3878583 0.9885 0.7246
-- 93- 106 3166670 0.9904 0.7388
-- 107- 121 2812153 0.9920 0.7523
-- 122- 137 2193861 0.9934 0.7661
-- 138- 154 1595830 0.9945 0.7781
-- 155- 172 1346205 0.9953 0.7880
-- 173- 191 1152753 0.9959 0.7975
-- 192- 211 906803 0.9965 0.8065
-- 212- 232 703670 0.9970 0.8143
-- 233- 254 584340 0.9973 0.8209
-- 255- 277 500318 0.9976 0.8271
-- 278- 301 413453 0.9979 0.8328
-- 302- 326 364479 0.9981 0.8379
-- 327- 352 347201 0.9983 0.8428
-- 353- 379 316787 0.9984 0.8479
-- 380- 407 267574 0.9986 0.8529
-- 408- 436 219121 0.9987 0.8574
-- 437- 466 187230 0.9988 0.8614
-- 467- 497 174488 0.9989 0.8651
-- 498- 529 164479 0.9990 0.8687
-- 530- 562 150514 0.9991 0.8723
-- 563- 596 136607 0.9992 0.8759
-- 597- 631 124724 0.9992 0.8793
-- 632- 667 109797 0.9993 0.8826
-- 668- 704 92704 0.9994 0.8857
-- 705- 742 78733 0.9994 0.8884
-- 743- 781 72281 0.9994 0.8909
-- 782- 821 71300 0.9995 0.8932
--
-- 0 (max occurrences)
-- 23128879630 (total mers, non-unique)
-- 1982754231 (distinct mers, non-unique)
-- 0 (unique mers)
-- Finished stage 'meryl-process', reset canuIteration.
--
-- Removing meryl database 'correction/0-mercounts/tf.ms16'.
--
-- OVERLAPPER (mhap) (correction)
--
--
-- PARAMETERS: hashes=512, minMatches=3, threshold=0.78
--
-- Given 11.7 GB, can fit 35100 reads per block.
-- For 42 blocks, set stride to 10 blocks.
-- Logging partitioning to 'correction/1-overlapper/partitioning.log'.
-- Configured 41 mhap precompute jobs.
-- Configured 101 mhap overlap jobs.
-- Finished stage 'cor-mhapConfigure', reset canuIteration.
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Sun Sep 3 11:27:14 2023 with 798985 GB free disk space (41 processes; 4 concurrently)
cd correction/1-overlapper
./precompute.sh 1 > ./precompute.000001.out 2>&1
./precompute.sh 2 > ./precompute.000002.out 2>&1
./precompute.sh 3 > ./precompute.000003.out 2>&1
./precompute.sh 4 > ./precompute.000004.out 2>&1
./precompute.sh 5 > ./precompute.000005.out 2>&1
./precompute.sh 6 > ./precompute.000006.out 2>&1
./precompute.sh 7 > ./precompute.000007.out 2>&1
./precompute.sh 8 > ./precompute.000008.out 2>&1
./precompute.sh 9 > ./precompute.000009.out 2>&1
./precompute.sh 10 > ./precompute.000010.out 2>&1
./precompute.sh 11 > ./precompute.000011.out 2>&1
./precompute.sh 12 > ./precompute.000012.out 2>&1
./precompute.sh 13 > ./precompute.000013.out 2>&1
./precompute.sh 14 > ./precompute.000014.out 2>&1
./precompute.sh 15 > ./precompute.000015.out 2>&1
./precompute.sh 16 > ./precompute.000016.out 2>&1
./precompute.sh 17 > ./precompute.000017.out 2>&1
./precompute.sh 18 > ./precompute.000018.out 2>&1
./precompute.sh 19 > ./precompute.000019.out 2>&1
./precompute.sh 20 > ./precompute.000020.out 2>&1
./precompute.sh 21 > ./precompute.000021.out 2>&1
./precompute.sh 22 > ./precompute.000022.out 2>&1
./precompute.sh 23 > ./precompute.000023.out 2>&1
./precompute.sh 24 > ./precompute.000024.out 2>&1
./precompute.sh 25 > ./precompute.000025.out 2>&1
./precompute.sh 26 > ./precompute.000026.out 2>&1
./precompute.sh 27 > ./precompute.000027.out 2>&1
./precompute.sh 28 > ./precompute.000028.out 2>&1
./precompute.sh 29 > ./precompute.000029.out 2>&1
./precompute.sh 30 > ./precompute.000030.out 2>&1
./precompute.sh 31 > ./precompute.000031.out 2>&1
./precompute.sh 32 > ./precompute.000032.out 2>&1
./precompute.sh 33 > ./precompute.000033.out 2>&1
./precompute.sh 34 > ./precompute.000034.out 2>&1
./precompute.sh 35 > ./precompute.000035.out 2>&1
./precompute.sh 36 > ./precompute.000036.out 2>&1
./precompute.sh 37 > ./precompute.000037.out 2>&1
./precompute.sh 38 > ./precompute.000038.out 2>&1
./precompute.sh 39 > ./precompute.000039.out 2>&1
./precompute.sh 40 > ./precompute.000040.out 2>&1
./precompute.sh 41 > ./precompute.000041.out 2>&1
-- Finished on Sun Sep 3 15:07:24 2023 (13210 seconds, like watching paint dry) with 862199.531 GB free disk space
----------------------------------------
-- All 41 mhap precompute jobs finished successfully.
-- Finished stage 'cor-mhapPrecomputeCheck', reset canuIteration.
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Sun Sep 3 15:07:24 2023 with 862199.5 GB free disk space (101 processes; 4 concurrently)
cd correction/1-overlapper
./mhap.sh 1 > ./mhap.000001.out 2>&1
./mhap.sh 2 > ./mhap.000002.out 2>&1
./mhap.sh 3 > ./mhap.000003.out 2>&1
./mhap.sh 4 > ./mhap.000004.out 2>&1
./mhap.sh 5 > ./mhap.000005.out 2>&1
./mhap.sh 6 > ./mhap.000006.out 2>&1
./mhap.sh 7 > ./mhap.000007.out 2>&1
./mhap.sh 8 > ./mhap.000008.out 2>&1
./mhap.sh 9 > ./mhap.000009.out 2>&1
./mhap.sh 10 > ./mhap.000010.out 2>&1
./mhap.sh 11 > ./mhap.000011.out 2>&1
./mhap.sh 12 > ./mhap.000012.out 2>&1
./mhap.sh 13 > ./mhap.000013.out 2>&1
./mhap.sh 14 > ./mhap.000014.out 2>&1
./mhap.sh 15 > ./mhap.000015.out 2>&1
./mhap.sh 16 > ./mhap.000016.out 2>&1
./mhap.sh 17 > ./mhap.000017.out 2>&1
./mhap.sh 18 > ./mhap.000018.out 2>&1
./mhap.sh 19 > ./mhap.000019.out 2>&1
./mhap.sh 20 > ./mhap.000020.out 2>&1
./mhap.sh 21 > ./mhap.000021.out 2>&1
./mhap.sh 22 > ./mhap.000022.out 2>&1
./mhap.sh 23 > ./mhap.000023.out 2>&1
./mhap.sh 24 > ./mhap.000024.out 2>&1
./mhap.sh 25 > ./mhap.000025.out 2>&1
./mhap.sh 26 > ./mhap.000026.out 2>&1
./mhap.sh 27 > ./mhap.000027.out 2>&1
./mhap.sh 28 > ./mhap.000028.out 2>&1
./mhap.sh 29 > ./mhap.000029.out 2>&1
./mhap.sh 30 > ./mhap.000030.out 2>&1
./mhap.sh 31 > ./mhap.000031.out 2>&1
./mhap.sh 32 > ./mhap.000032.out 2>&1
./mhap.sh 33 > ./mhap.000033.out 2>&1
./mhap.sh 34 > ./mhap.000034.out 2>&1
./mhap.sh 35 > ./mhap.000035.out 2>&1
./mhap.sh 36 > ./mhap.000036.out 2>&1
./mhap.sh 37 > ./mhap.000037.out 2>&1
./mhap.sh 38 > ./mhap.000038.out 2>&1
./mhap.sh 39 > ./mhap.000039.out 2>&1
./mhap.sh 40 > ./mhap.000040.out 2>&1
./mhap.sh 41 > ./mhap.000041.out 2>&1
./mhap.sh 42 > ./mhap.000042.out 2>&1
./mhap.sh 43 > ./mhap.000043.out 2>&1
./mhap.sh 44 > ./mhap.000044.out 2>&1
./mhap.sh 45 > ./mhap.000045.out 2>&1
./mhap.sh 46 > ./mhap.000046.out 2>&1
./mhap.sh 47 > ./mhap.000047.out 2>&1
./mhap.sh 48 > ./mhap.000048.out 2>&1
./mhap.sh 49 > ./mhap.000049.out 2>&1
./mhap.sh 50 > ./mhap.000050.out 2>&1
./mhap.sh 51 > ./mhap.000051.out 2>&1
./mhap.sh 52 > ./mhap.000052.out 2>&1
./mhap.sh 53 > ./mhap.000053.out 2>&1
./mhap.sh 54 > ./mhap.000054.out 2>&1
./mhap.sh 55 > ./mhap.000055.out 2>&1
./mhap.sh 56 > ./mhap.000056.out 2>&1
./mhap.sh 57 > ./mhap.000057.out 2>&1
./mhap.sh 58 > ./mhap.000058.out 2>&1
./mhap.sh 59 > ./mhap.000059.out 2>&1
./mhap.sh 60 > ./mhap.000060.out 2>&1
./mhap.sh 61 > ./mhap.000061.out 2>&1
./mhap.sh 62 > ./mhap.000062.out 2>&1
./mhap.sh 63 > ./mhap.000063.out 2>&1
./mhap.sh 64 > ./mhap.000064.out 2>&1
./mhap.sh 65 > ./mhap.000065.out 2>&1
./mhap.sh 66 > ./mhap.000066.out 2>&1
./mhap.sh 67 > ./mhap.000067.out 2>&1
./mhap.sh 68 > ./mhap.000068.out 2>&1
./mhap.sh 69 > ./mhap.000069.out 2>&1
./mhap.sh 70 > ./mhap.000070.out 2>&1
./mhap.sh 71 > ./mhap.000071.out 2>&1
./mhap.sh 72 > ./mhap.000072.out 2>&1
./mhap.sh 73 > ./mhap.000073.out 2>&1
./mhap.sh 74 > ./mhap.000074.out 2>&1
./mhap.sh 75 > ./mhap.000075.out 2>&1
./mhap.sh 76 > ./mhap.000076.out 2>&1
./mhap.sh 77 > ./mhap.000077.out 2>&1
./mhap.sh 78 > ./mhap.000078.out 2>&1
./mhap.sh 79 > ./mhap.000079.out 2>&1
./mhap.sh 80 > ./mhap.000080.out 2>&1
./mhap.sh 81 > ./mhap.000081.out 2>&1
./mhap.sh 82 > ./mhap.000082.out 2>&1
./mhap.sh 83 > ./mhap.000083.out 2>&1
./mhap.sh 84 > ./mhap.000084.out 2>&1
./mhap.sh 85 > ./mhap.000085.out 2>&1
./mhap.sh 86 > ./mhap.000086.out 2>&1
./mhap.sh 87 > ./mhap.000087.out 2>&1
./mhap.sh 88 > ./mhap.000088.out 2>&1
./mhap.sh 89 > ./mhap.000089.out 2>&1
./mhap.sh 90 > ./mhap.000090.out 2>&1
./mhap.sh 91 > ./mhap.000091.out 2>&1
./mhap.sh 92 > ./mhap.000092.out 2>&1
./mhap.sh 93 > ./mhap.000093.out 2>&1
./mhap.sh 94 > ./mhap.000094.out 2>&1
./mhap.sh 95 > ./mhap.000095.out 2>&1
./mhap.sh 96 > ./mhap.000096.out 2>&1
./mhap.sh 97 > ./mhap.000097.out 2>&1
./mhap.sh 98 > ./mhap.000098.out 2>&1
./mhap.sh 99 > ./mhap.000099.out 2>&1
./mhap.sh 100 > ./mhap.000100.out 2>&1
./mhap.sh 101 > ./mhap.000101.out 2>&1
-- Finished on Sun Sep 3 17:57:22 2023 (10198 seconds, fashionably late) with 860200.187 GB free disk space
----------------------------------------
-- Found 101 mhap overlap output files.
-- Finished stage 'cor-mhapCheck', reset canuIteration.
----------------------------------------
-- Starting command on Sun Sep 3 17:57:22 2023 with 860200.156 GB free disk space
cd correction
/dssg/home/acct-jiang.lu/jiang.lu/biotools/canu-2.2/bin/ovStoreConfig \
-S ../tf.seqStore \
-M 4-8 \
-L ./1-overlapper/ovljob.files \
-create ./tf.ovlStore.config \
> ./tf.ovlStore.config.txt \
2> ./tf.ovlStore.config.err
-- Finished on Sun Sep 3 17:57:26 2023 (4 seconds) with 860200.125 GB free disk space
----------------------------------------
--
-- Creating overlap store correction/tf.ovlStore using:
-- 31 buckets
-- 31 slices
-- using at most 8 GB memory each
-- Finished stage 'cor-overlapStoreConfigure', reset canuIteration.
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Sun Sep 3 17:57:26 2023 with 860200.093 GB free disk space (31 processes; 40 concurrently)
cd correction/tf.ovlStore.BUILDING
./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1
./scripts/1-bucketize.sh 2 > ./logs/1-bucketize.000002.out 2>&1
./scripts/1-bucketize.sh 3 > ./logs/1-bucketize.000003.out 2>&1
./scripts/1-bucketize.sh 4 > ./logs/1-bucketize.000004.out 2>&1
./scripts/1-bucketize.sh 5 > ./logs/1-bucketize.000005.out 2>&1
./scripts/1-bucketize.sh 6 > ./logs/1-bucketize.000006.out 2>&1
./scripts/1-bucketize.sh 7 > ./logs/1-bucketize.000007.out 2>&1
./scripts/1-bucketize.sh 8 > ./logs/1-bucketize.000008.out 2>&1
./scripts/1-bucketize.sh 9 > ./logs/1-bucketize.000009.out 2>&1
./scripts/1-bucketize.sh 10 > ./logs/1-bucketize.000010.out 2>&1
./scripts/1-bucketize.sh 11 > ./logs/1-bucketize.000011.out 2>&1
./scripts/1-bucketize.sh 12 > ./logs/1-bucketize.000012.out 2>&1
./scripts/1-bucketize.sh 13 > ./logs/1-bucketize.000013.out 2>&1
./scripts/1-bucketize.sh 14 > ./logs/1-bucketize.000014.out 2>&1
./scripts/1-bucketize.sh 15 > ./logs/1-bucketize.000015.out 2>&1
./scripts/1-bucketize.sh 16 > ./logs/1-bucketize.000016.out 2>&1
./scripts/1-bucketize.sh 17 > ./logs/1-bucketize.000017.out 2>&1
./scripts/1-bucketize.sh 18 > ./logs/1-bucketize.000018.out 2>&1
./scripts/1-bucketize.sh 19 > ./logs/1-bucketize.000019.out 2>&1
./scripts/1-bucketize.sh 20 > ./logs/1-bucketize.000020.out 2>&1
./scripts/1-bucketize.sh 21 > ./logs/1-bucketize.000021.out 2>&1
./scripts/1-bucketize.sh 22 > ./logs/1-bucketize.000022.out 2>&1
./scripts/1-bucketize.sh 23 > ./logs/1-bucketize.000023.out 2>&1
./scripts/1-bucketize.sh 24 > ./logs/1-bucketize.000024.out 2>&1
./scripts/1-bucketize.sh 25 > ./logs/1-bucketize.000025.out 2>&1
./scripts/1-bucketize.sh 26 > ./logs/1-bucketize.000026.out 2>&1
./scripts/1-bucketize.sh 27 > ./logs/1-bucketize.000027.out 2>&1
./scripts/1-bucketize.sh 28 > ./logs/1-bucketize.000028.out 2>&1
./scripts/1-bucketize.sh 29 > ./logs/1-bucketize.000029.out 2>&1
./scripts/1-bucketize.sh 30 > ./logs/1-bucketize.000030.out 2>&1
./scripts/1-bucketize.sh 31 > ./logs/1-bucketize.000031.out 2>&1
-- Finished on Sun Sep 3 17:58:55 2023 (89 seconds) with 859947.25 GB free disk space
----------------------------------------
-- Overlap store bucketizer finished.
-- Finished stage 'cor-overlapStoreBucketizerCheck', reset canuIteration.
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'ovS' concurrent execution on Sun Sep 3 17:58:55 2023 with 859947.218 GB free disk space (31 processes; 40 concurrently)
cd correction/tf.ovlStore.BUILDING
./scripts/2-sort.sh 1 > ./logs/2-sort.000001.out 2>&1
./scripts/2-sort.sh 2 > ./logs/2-sort.000002.out 2>&1
./scripts/2-sort.sh 3 > ./logs/2-sort.000003.out 2>&1
./scripts/2-sort.sh 4 > ./logs/2-sort.000004.out 2>&1
./scripts/2-sort.sh 5 > ./logs/2-sort.000005.out 2>&1
./scripts/2-sort.sh 6 > ./logs/2-sort.000006.out 2>&1
./scripts/2-sort.sh 7 > ./logs/2-sort.000007.out 2>&1
./scripts/2-sort.sh 8 > ./logs/2-sort.000008.out 2>&1
./scripts/2-sort.sh 9 > ./logs/2-sort.000009.out 2>&1
./scripts/2-sort.sh 10 > ./logs/2-sort.000010.out 2>&1
./scripts/2-sort.sh 11 > ./logs/2-sort.000011.out 2>&1
./scripts/2-sort.sh 12 > ./logs/2-sort.000012.out 2>&1
./scripts/2-sort.sh 13 > ./logs/2-sort.000013.out 2>&1
./scripts/2-sort.sh 14 > ./logs/2-sort.000014.out 2>&1
./scripts/2-sort.sh 15 > ./logs/2-sort.000015.out 2>&1
./scripts/2-sort.sh 16 > ./logs/2-sort.000016.out 2>&1
./scripts/2-sort.sh 17 > ./logs/2-sort.000017.out 2>&1
./scripts/2-sort.sh 18 > ./logs/2-sort.000018.out 2>&1
./scripts/2-sort.sh 19 > ./logs/2-sort.000019.out 2>&1
./scripts/2-sort.sh 20 > ./logs/2-sort.000020.out 2>&1
./scripts/2-sort.sh 21 > ./logs/2-sort.000021.out 2>&1
./scripts/2-sort.sh 22 > ./logs/2-sort.000022.out 2>&1
./scripts/2-sort.sh 23 > ./logs/2-sort.000023.out 2>&1
./scripts/2-sort.sh 24 > ./logs/2-sort.000024.out 2>&1
./scripts/2-sort.sh 25 > ./logs/2-sort.000025.out 2>&1
./scripts/2-sort.sh 26 > ./logs/2-sort.000026.out 2>&1
./scripts/2-sort.sh 27 > ./logs/2-sort.000027.out 2>&1
./scripts/2-sort.sh 28 > ./logs/2-sort.000028.out 2>&1
./scripts/2-sort.sh 29 > ./logs/2-sort.000029.out 2>&1
./scripts/2-sort.sh 30 > ./logs/2-sort.000030.out 2>&1
./scripts/2-sort.sh 31 > ./logs/2-sort.000031.out 2>&1
-- Finished on Sun Sep 3 18:00:29 2023 (94 seconds) with 859955.703 GB free disk space
----------------------------------------
-- Overlap store sorter finished.
-- Finished stage 'cor-overlapStoreSorterCheck', reset canuIteration.
----------------------------------------
-- Starting command on Sun Sep 3 18:00:29 2023 with 859955.687 GB free disk space
cd correction
/dssg/home/acct-jiang.lu/jiang.lu/biotools/canu-2.2/bin/ovStoreIndexer \
-O ./tf.ovlStore.BUILDING \
-S ../tf.seqStore \
-C ./tf.ovlStore.config \
-delete \
> ./tf.ovlStore.BUILDING.index.err 2>&1
-- Finished on Sun Sep 3 18:00:31 2023 (2 seconds) with 859956.875 GB free disk space
----------------------------------------
-- Overlap store indexer finished.
-- Checking store.
----------------------------------------
-- Starting command on Sun Sep 3 18:00:31 2023 with 859956.875 GB free disk space
cd correction
/dssg/home/acct-jiang.lu/jiang.lu/biotools/canu-2.2/bin/ovStoreDump \
-S ../tf.seqStore \
-O ./tf.ovlStore \
-counts \
> ./tf.ovlStore/counts.dat 2> ./tf.ovlStore/counts.err
-- Finished on Sun Sep 3 18:00:31 2023 (in the blink of an eye) with 859956.859 GB free disk space
----------------------------------------
--
-- Overlap store 'correction/tf.ovlStore' successfully constructed.
-- Found 8984816058 overlaps for 1056338 reads; 354801 reads have no overlaps.
--
--
-- Purged 109.985 GB in 243 overlap output files.
-- Finished stage 'cor-createOverlapStore', reset canuIteration.
-- Set corMinCoverage=4 based on read coverage of 141.56.
-- Computing correction layouts.
-- Local filter coverage 80
-- Global filter coverage 40
----------------------------------------
-- Starting command on Sun Sep 3 18:00:33 2023 with 860065.234 GB free disk space
cd correction
/dssg/home/acct-jiang.lu/jiang.lu/biotools/canu-2.2/bin/generateCorrectionLayouts \
-S ../tf.seqStore \
-O ./tf.ovlStore \
-C ./tf.corStore.WORKING \
-eC 80 \
-xC 40 \
> ./tf.corStore.err 2>&1
-- Finished on Sun Sep 3 18:05:57 2023 (324 seconds) with 859913.421 GB free disk space
----------------------------------------
-- Finished stage 'cor-buildCorrectionLayoutsConfigure', reset canuIteration.
-- Computing correction layouts.
----------------------------------------
-- Starting command on Sun Sep 3 18:05:57 2023 with 859913.406 GB free disk space
cd correction/2-correction
/dssg/home/acct-jiang.lu/jiang.lu/biotools/canu-2.2/bin/filterCorrectionLayouts \
-S ../../tf.seqStore \
-C ../tf.corStore \
-R ./tf.readsToCorrect.WORKING \
-cc 4 \
-cl 1000 \
-g 164400000 \
-c 40 \
> ./tf.readsToCorrect.err 2>&1
-- Finished on Sun Sep 3 18:06:02 2023 (5 seconds) with 859913.265 GB free disk space
----------------------------------------
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 586101 825038
-- Number of Bases 9944907745 7449268760
-- Coverage 60.492 45.312
-- Median 12918 6468
-- Mean 16967 9029
-- N50 19918 19439
-- Minimum 1000 0
-- Maximum 320451 173539
--
-- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 988282 278569 278569 0 0
-- Number of Bases 16233258060 5695258786 3875213746 0 0
-- Coverage 98.742 34.643 23.572 0.000 0.000
-- Median 12372 16341 11075 0 0
-- Mean 16425 20444 13911 0 0
-- N50 19576 25012 21769 0 0
-- Minimum 1000 1000 1 0 0
-- Maximum 320451 224529 223909 0 0
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 1132570 1132570
-- Number of Bases 11698917719 265224
-- Coverage 71.161 0.002
-- Median 9247 0
-- Mean 10329 0
-- N50 16737 0
-- Minimum 0 0
-- Maximum 320451 265224
--
-- Maximum Memory 3862323740
-- Finished stage 'cor-filterCorrectionLayouts', reset canuIteration.
--
-- Correction jobs estimated to need at most 3.597 GB for computation.
-- Correction jobs will request 12 GB each.
--
-- Local: cor 12.000 GB 4 CPUs x 10 jobs 120.000 GB 40 CPUs (read correction)
--
--
-- Configuring correction jobs:
-- Reads estimated to need at most 3.597 GB for computation.
-- Jobs will request 12 GB each.
----------------------------------------
-- Starting command on Sun Sep 3 18:06:02 2023 with 859913.25 GB free disk space
cd correction/2-correction
./correctReadsPartition.sh \
> ./correctReadsPartition.err 2>&1
-- Finished on Sun Sep 3 18:06:03 2023 (one second) with 859913.25 GB free disk space
----------------------------------------
-- Finished stage 'cor-generateCorrectedReadsConfigure', reset canuIteration.
--
-- Running jobs. First attempt out of 2.
----------------------------------------
-- Starting 'cor' concurrent execution on Sun Sep 3 18:06:03 2023 with 859913.218 GB free disk space (14 processes; 10 concurrently)
cd correction/2-correction
./correctReads.sh 1 > ./correctReads.000001.out 2>&1
./correctReads.sh 2 > ./correctReads.000002.out 2>&1
./correctReads.sh 3 > ./correctReads.000003.out 2>&1
./correctReads.sh 4 > ./correctReads.000004.out 2>&1
./correctReads.sh 5 > ./correctReads.000005.out 2>&1
./correctReads.sh 6 > ./correctReads.000006.out 2>&1
./correctReads.sh 7 > ./correctReads.000007.out 2>&1
./correctReads.sh 8 > ./correctReads.000008.out 2>&1
./correctReads.sh 9 > ./correctReads.000009.out 2>&1
./correctReads.sh 10 > ./correctReads.000010.out 2>&1
./correctReads.sh 11 > ./correctReads.000011.out 2>&1
./correctReads.sh 12 > ./correctReads.000012.out 2>&1
./correctReads.sh 13 > ./correctReads.000013.out 2>&1
./correctReads.sh 14 > ./correctReads.000014.out 2>&1
-- Finished on Sun Sep 3 19:59:25 2023 (6802 seconds) with 859022.562 GB free disk space
----------------------------------------
-- Found 14 read correction output files.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
-- Found 14 read correction output files.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
--
-- Loading corrected reads into corStore and seqStore.
----------------------------------------
-- Starting command on Sun Sep 3 19:59:25 2023 with 859022.531 GB free disk space
cd correction
/dssg/home/acct-jiang.lu/jiang.lu/biotools/canu-2.2/bin/loadCorrectedReads \
-S ../tf.seqStore \
-C ./tf.corStore \
-L ./2-correction/corjob.files \
> ./tf.loadCorrectedReads.log \
2> ./tf.loadCorrectedReads.err
-- Finished on Sun Sep 3 19:59:30 2023 (5 seconds) with 859022.187 GB free disk space
----------------------------------------
--
-- In sequence store './tf.seqStore':
-- Found 10366 reads.
-- Found 90002428 bases (0.54 times coverage).
-- Histogram of corrected reads:
--
-- G=90002428 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 57906 132 9029984 || 1000-3860 5156|---------------------------------------------------------------
-- 00020 47381 303 18027143 || 3861-6721 1939|------------------------
-- 00030 38132 514 27005835 || 6722-9582 993|-------------
-- 00040 30212 780 36019258 || 9583-12443 520|-------
-- 00050 21854 1130 45005384 || 12444-15304 238|---
-- 00060 13609 1654 54004605 || 15305-18165 222|---
-- 00070 8755 2500 63002627 || 18166-21026 129|--
-- 00080 5769 3765 72004349 || 21027-23887 143|--
-- 00090 3312 5809 81004665 || 23888-26748 115|--
-- 00100 1000 10365 90002428 || 26749-29609 104|--
-- 001.000x 10366 90002428 || 29610-32470 114|--
-- || 32471-35331 92|--
-- || 35332-38192 89|--
-- || 38193-41053 70|-
-- || 41054-43914 62|-
-- || 43915-46775 66|-
-- || 46776-49636 40|-
-- || 49637-52497 61|-
-- || 52498-55358 42|-
-- || 55359-58219 40|-
-- || 58220-61080 28|-
-- || 61081-63941 25|-
-- || 63942-66802 29|-
-- || 66803-69663 20|-
-- || 69664-72524 16|-
-- || 72525-75385 5|-
-- || 75386-78246 2|-
-- || 78247-81107 0|
-- || 81108-83968 1|-
-- || 83969-86829 0|
-- || 86830-89690 0|
-- || 89691-92551 0|
-- || 92552-95412 0|
-- || 95413-98273 0|
-- || 98274-101134 0|
-- || 101135-103995 0|
-- || 103996-106856 0|
-- || 106857-109717 0|
-- || 109718-112578 0|
-- || 112579-115439 0|
-- || 115440-118300 0|
-- || 118301-121161 0|
-- || 121162-124022 0|
-- || 124023-126883 0|
-- || 126884-129744 0|
-- || 129745-132605 2|-
-- || 132606-135466 0|
-- || 135467-138327 1|-
-- || 138328-141188 0|
-- || 141189-144049 2|-
--
--
-- Purging correctReads output after loading into stores.
-- Purged 14 .cns outputs.
-- Purged 28 .out job log outputs.
--
-- No corrected reads generated, overlaps used for correction saved.
-- Finished stage 'cor-loadCorrectedReads', reset canuIteration.
----------------------------------------
-- Starting command on Sun Sep 3 19:59:30 2023 with 859022.125 GB free disk space
cd .
/dssg/home/acct-jiang.lu/jiang.lu/biotools/canu-2.2/bin/sqStoreDumpFASTQ \
-corrected \
-S ./tf.seqStore \
-o ./tf.correctedReads.gz \
-fasta \
-nolibname \
> tf.correctedReads.fasta.err 2>&1
-- Finished on Sun Sep 3 19:59:32 2023 (2 seconds) with 859022.078 GB free disk space
----------------------------------------
--
-- Corrected reads saved in 'tf.correctedReads.fasta.gz'.
-- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
--
-- ERROR: Read coverage (0.54) lower than allowed.
-- ERROR: stopOnLowCoverage = 10
-- ERROR:
-- ERROR: This could be caused by an incorrect genomeSize or poor
-- ERROR: quality reads that cound not be sufficiently corrected.
-- ERROR:
-- ERROR: You can force Canu to continue by decreasing parameter
-- ERROR: stopOnLowCoverage (and possibly minInputCoverage too).
-- ERROR: Be warned that the quality of corrected reads and/or
-- ERROR: contiguity of contigs will be poor.
--
ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
It seems these reads may be lower quality than expected. You could try corMhapSensitivity=high corMinCoverage=0
and see if that helps. The reads are overall pretty short, mostly under 15kb which isn't ideal for a good assembly. How old are these reads? Have you tried converting them to HiFi data to see how much coverage you end up with after that? This would make the 10kb reads more useful for assembly if they were high accuracy.
The data was uploaded to NCBI on 2023-01-30 (SRR23272336) and the preprint was published to Biorxiv on 2023-06-22. According to their preprint:
Moreover, a total of 2.2 million reads larger than 500 bp were obtained by PacBio Sequel sequencing, with a coverage depth of approximately 79 X (Fig. S3). 149,029 reads (about 52% of the total) were larger than 5 kb in length, of which 81.57% had an average base length of 10 kb.
Any ideas to deal with this quality data? Their assembly pipeline is Falcon + Pilon +LACHESIS. But I think it is too old since LACHESIS doesn't develop anymore.
Thanks for your time!
I am not sure that this was properly submitted to NCBI. The original file name here: https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&page_size=10&acc=SRR23272336&display=data-access is listed as: s3://sra-pub-src-18/SRR23272336/m54180_171129_064641.scraps.bam.1. Scraps.bam is the failed reads from the run, the reads are normally named subreads.bam (https://pacbio.gs.washington.edu/documents/Raw_Data_files.pdf). The read statistics also don't match what you have (coverage and number of reads are both different). I'd suggest contacting the authors and confirming the NCBI submission is accurate.
Oh my god! I can't believe that the raw data is wrong... Thank you for helping me get out of the delimma.
I find another NCBI Pacbio SRA, and here is the quality graph:
And the quality graph of SRR23272336 is like:
So it's indeed abnormal...
I am assembling a genome using Canu 2.2 with the following commands on a slurm cluster:
The input Pacbio fasta file is corrected by LoRDEC. The error of low coverage came up for 3 times and I don't know what to do, so I come for some help. The log is below:
I have tried to add the
corMhapSensitivity=normal
, but nothing changed.