marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
659 stars 179 forks source link

Still can not get continuous contigs with suggested parameters #738

Closed ml3958 closed 6 years ago

ml3958 commented 6 years ago

I have been trying to assemble a bacteria genome of 2.5mb with a high coverage 1800X PacBio data, but had no luck. I tried a few parameter suggested, None of them gave me ideal continous contigs.

Attempts:

  1. increasing corOutCoverage=100 - it gave me sufficient corrected reads (91X), but discontinuous contigs (67 sequences, total length 2696826 bp including 6 repeats of total length 198541 bp).
  2. set corMhapSensitivity=high corMinCoverage=0 - coverage is also ok (40X), but contigs (total length 246496bp) are only 10% coverage compared to genome size (2.5mb)
  3. corOutCoverage=500 ovlErrorRate=0.15 obtErrorRate=0.15 to smash hyplotype. This gave me crazy number of contigs (774 sequences, total length 28825721 bp including 180 repeats of total length 10087773 bp) and ran over two days.....

I am rerunning Canu with just corOutCoverage=200 now. In the meantime, I am questioning if my samples is contaminated..... any suggestions?

Thank you!!

The report for attempts 1. 2. 3. are below

  1. 
    [CORRECTION/READS]
    --
    -- In gatekeeper store 'correction/hc1_hybrid_cov100.gkpStore':
    --   Found 299881 reads.
    --   Found 4606996028 bases (1850.19 times coverage).
    --
    --   Read length histogram (one '*' equals 297.84 reads):
    --        0    999      0 
    --     1000   1999  16667 *******************************************************
    --     2000   2999  14494 ************************************************
    --     3000   3999  20849 **********************************************************************
    --     4000   4999  11161 *************************************
    --     5000   5999  10694 ***********************************
    --     6000   6999  10615 ***********************************
    --     7000   7999  10528 ***********************************
    --     8000   8999  11964 ****************************************
    --     9000   9999  12962 *******************************************
    --    10000  10999  12543 ******************************************
    --    11000  11999  12155 ****************************************
    --    12000  12999  11503 **************************************
    --    13000  13999  10361 **********************************
    --    14000  14999   9780 ********************************
    --    15000  15999   9125 ******************************
    --    16000  16999   8695 *****************************
    --    17000  17999   8084 ***************************
    --    18000  18999   7631 *************************
    --    19000  19999   6904 ***********************
    --    20000  20999   6324 *********************
    --    21000  21999   5813 *******************
    --    22000  22999   5092 *****************
    --    23000  23999   4691 ***************
    --    24000  24999   4219 **************
    --    25000  25999   3860 ************
    --    26000  26999   3629 ************
    --    27000  27999   3273 **********
    --    28000  28999   3076 **********
    --    29000  29999   2942 *********
    --    30000  30999   2789 *********
    --    31000  31999   2694 *********
    --    32000  32999   2727 *********
    --    33000  33999   2764 *********
    --    34000  34999   2771 *********
    --    35000  35999   2846 *********
    --    36000  36999   2821 *********
    --    37000  37999   2600 ********
    --    38000  38999   2399 ********
    --    39000  39999   2174 *******
    --    40000  40999   1961 ******
    --    41000  41999   1770 *****
    --    42000  42999   1514 *****
    --    43000  43999   1329 ****
    --    44000  44999   1096 ***
    --    45000  45999    920 ***
    --    46000  46999    798 **
    --    47000  47999    714 **
    --    48000  48999    592 *
    --    49000  49999    493 *
    --    50000  50999    438 *
    --    51000  51999    376 *
    --    52000  52999    276 
    --    53000  53999    247 
    --    54000  54999    212 
    --    55000  55999    152 
    --    56000  56999    149 
    --    57000  57999    121 
    --    58000  58999    108 
    --    59000  59999     95 
    --    60000  60999     76 
    --    61000  61999     57 
    --    62000  62999     36 
    --    63000  63999     35 
    --    64000  64999     23 
    --    65000  65999     23 
    --    66000  66999     15 
    --    67000  67999     11 
    --    68000  68999      7 
    --    69000  69999      6 
    --    70000  70999      5 
    --    71000  71999      2 
    --    72000  72999      1 
    --    73000  73999      1 
    --    74000  74999      1 
    --    75000  75999      1 
    --    76000  76999      0 
    --    77000  77999      0 
    --    78000  78999      0 
    --    79000  79999      0 
    --    80000  80999      0 
    --    81000  81999      0 
    --    82000  82999      0 
    --    83000  83999      0 
    --    84000  84999      0 
    --    85000  85999      0 
    --    86000  86999      0 
    --    87000  87999      0 
    --    88000  88999      0 
    --    89000  89999      0 
    --    90000  90999      0 
    --    91000  91999      0 
    --    92000  92999      0 
    --    93000  93999      0 
    --    94000  94999      0 
    --    95000  95999      0 
    --    96000  96999      0 
    --    97000  97999      0 
    --    98000  98999      0 
    --    99000  99999      1

[CORRECTION/MERS]

-- 16-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 590490024 *****--> 0.3880 0.1283 -- 2- 2 384381114 ** 0.6406 0.2953 -- 3- 4 344538198 ** 0.7861 0.4397 -- 5- 7 132698608 **** 0.9122 0.6215 -- 8- 11 39138405 * 0.9643 0.7368 -- 12- 16 14968864 * 0.9827 0.7986 -- 17- 22 6729681 0.9907 0.8374 -- 23- 29 3222241 0.9945 0.8625 -- 30- 37 1586497 0.9964 0.8789 -- 38- 46 778504 0.9973 0.8895 -- 47- 56 376394 0.9978 0.8960 -- 57- 67 189576 0.9980 0.8999 -- 68- 79 117437 0.9982 0.9023 -- 80- 92 131456 0.9982 0.9042 -- 93- 106 236023 0.9983 0.9068 -- 107- 121 411456 0.9985 0.9121 -- 122- 137 568627 0.9988 0.9227 -- 138- 154 570369 0.9991 0.9390 -- 155- 172 398398 0.9995 0.9569 -- 173- 191 193149 0.9998 0.9706 -- 192- 211 72915 0.9999 0.9778 -- 212- 232 28822 0.9999 0.9808 -- 233- 254 16500 0.9999 0.9822 -- 255- 277 12514 1.0000 0.9830 -- 278- 301 9921 1.0000 0.9838 -- 302- 326 7569 1.0000 0.9844 -- 327- 352 5747 1.0000 0.9849 -- 353- 379 4418 1.0000 0.9853 -- 380- 407 3747 1.0000 0.9857 -- 408- 436 3033 1.0000 0.9860 -- 437- 466 2442 1.0000 0.9862 -- 467- 497 1792 1.0000 0.9865 -- 498- 529 1390 1.0000 0.9867 -- 530- 562 1082 1.0000 0.9868 -- 563- 596 862 1.0000 0.9870 -- 597- 631 680 1.0000 0.9871 -- 632- 667 618 1.0000 0.9871 -- 668- 704 527 1.0000 0.9872 -- 705- 742 530 1.0000 0.9873 -- 743- 781 571 1.0000 0.9874 -- 782- 821 627 1.0000 0.9875

-- 6187952 (max occurrences) -- 4012007789 (total mers, non-unique) -- 931423950 (distinct mers, non-unique) -- 590490024 (unique mers)

[CORRECTION/CORRECTIONS]

-- Reads to be corrected: -- 7742 reads longer than 43630 bp -- 280804813 bp -- Expected corrected reads: -- 7742 reads -- 249007801 bp -- 26861 bp minimum length -- 32163 bp mean length -- 45987 bp n50 length

[TRIMMING/READS]

-- In gatekeeper store 'trimming/hc1_hybrid_cov100.gkpStore': -- Found 7838 reads. -- Found 227633046 bases (91.41 times coverage).

-- Read length histogram (one '*' equals 10.9 reads): -- 0 999 0 -- 1000 1999 92 **** -- 2000 2999 42 * -- 3000 3999 31 -- 4000 4999 19 -- 5000 5999 12 -- 6000 6999 8 -- 7000 7999 4 -- 8000 8999 7 -- 9000 9999 5 -- 10000 10999 5 -- 11000 11999 7 -- 12000 12999 3 -- 13000 13999 5 -- 14000 14999 6 -- 15000 15999 7 -- 16000 16999 11 -- 17000 17999 14 -- 18000 18999 28 -- 19000 19999 20 * -- 20000 20999 23 -- 21000 21999 52 -- 22000 22999 73 ** -- 23000 23999 117 ** -- 24000 24999 223 **** -- 25000 25999 457 * -- 26000 26999 746 **** -- 27000 27999 722 ** -- 28000 28999 763 ** -- 29000 29999 746 **** -- 30000 30999 658 **** -- 31000 31999 680 ** -- 32000 32999 606 *** -- 33000 33999 581 *** -- 34000 34999 432 * -- 35000 35999 294 ** -- 36000 36999 195 *** -- 37000 37999 67 ** -- 38000 38999 38 * -- 39000 39999 16 * -- 40000 40999 6 -- 41000 41999 4 -- 42000 42999 3 -- 43000 43999 0 -- 44000 44999 1 -- 45000 45999 2 -- 46000 46999 4 -- 47000 47999 2 -- 48000 48999 0 -- 49000 49999 0 -- 50000 50999 0 -- 51000 51999 1

[TRIMMING/MERS]

-- 22-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 21250603 *****--> 0.8027 0.0934 -- 2- 2 1605147 ** 0.8633 0.1075 -- 3- 4 793634 ** 0.8838 0.1147 -- 5- 7 257946 * 0.8983 0.1220 -- 8- 11 77877 0.9042 0.1266 -- 12- 16 23408 0.9063 0.1290 -- 17- 22 10086 0.9069 0.1302 -- 23- 29 28710 * 0.9073 0.1311 -- 30- 37 65284 0.9085 0.1349 -- 38- 46 143680 **** 0.9113 0.1461 -- 47- 56 241228 ** 0.9169 0.1747 -- 57- 67 405156 *** 0.9265 0.2335 -- 68- 79 496940 ** 0.9423 0.3499 -- 80- 92 447531 0.9609 0.5112 -- 93- 106 328006 ** 0.9774 0.6773 -- 107- 121 176200 **** 0.9893 0.8164 -- 122- 137 87187 0.9957 0.9004 -- 138- 154 22102 0.9988 0.9467 -- 155- 172 5221 0.9995 0.9593 -- 173- 191 3770 0.9997 0.9630 -- 192- 211 1365 0.9998 0.9658 -- 212- 232 198 0.9999 0.9669 -- 233- 254 100 0.9999 0.9671 -- 255- 277 81 0.9999 0.9672 -- 278- 301 52 0.9999 0.9673 -- 302- 326 50 0.9999 0.9674 -- 327- 352 42 0.9999 0.9675 -- 353- 379 96 0.9999 0.9675 -- 380- 407 46 0.9999 0.9677 -- 408- 436 96 0.9999 0.9678 -- 437- 466 27 0.9999 0.9679 -- 467- 497 19 0.9999 0.9680 -- 498- 529 19 0.9999 0.9680 -- 530- 562 30 0.9999 0.9681 -- 563- 596 34 0.9999 0.9681 -- 597- 631 8 0.9999 0.9682 -- 632- 667 4 0.9999 0.9683 -- 668- 704 4 0.9999 0.9683 -- 705- 742 4 0.9999 0.9683 -- 743- 781 5 0.9999 0.9683 -- 782- 821 1 0.9999 0.9683

-- 874625 (max occurrences) -- 206217845 (total mers, non-unique) -- 5223583 (distinct mers, non-unique) -- 21250603 (unique mers)

[TRIMMING/TRIMMING] -- PARAMETERS:


-- 1000 (reads trimmed below this many bases are deleted) -- 0.1440 (use overlaps at or below this fraction error) -- 1 (break region if overlap is less than this long, for 'largest covered' algorithm) -- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm) --
-- INPUT READS:


-- 7838 reads 227633046 bases (reads processed) -- 0 reads 0 bases (reads not processed, previously deleted) -- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed) --
-- OUTPUT READS:


-- 7165 reads 174112663 bases (trimmed reads output) -- 661 reads 17126484 bases (reads with no change, kept as is) -- 10 reads 208473 bases (reads with no overlaps, deleted) -- 2 reads 24621 bases (reads with short trimmed length, deleted) --
-- TRIMMING DETAILS:


-- 5356 reads 18353687 bases (bases trimmed from the 5' end of a read) -- 6178 reads 17807118 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING] -- PARAMETERS:


-- 1000 (reads trimmed below this many bases are deleted) -- 0.1440 (use overlaps at or below this fraction error) -- INPUT READS:


-- 7826 reads 227399952 bases (reads processed) -- 12 reads 233094 bases (reads not processed, previously deleted) -- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed) --
-- PROCESSED:


-- 0 reads 0 bases (no overlaps) -- 0 reads 0 bases (no coverage after adjusting for trimming done already) -- 0 reads 0 bases (processed for chimera) -- 0 reads 0 bases (processed for spur) -- 7826 reads 227399952 bases (processed for subreads) --
-- READS WITH SIGNALS:


-- 0 reads 0 signals (number of 5' spur signal) -- 0 reads 0 signals (number of 3' spur signal) -- 0 reads 0 signals (number of chimera signal) -- 4405 reads 5109 signals (number of subread signal) --
-- SIGNALS:


-- 0 reads 0 bases (size of 5' spur signal) -- 0 reads 0 bases (size of 3' spur signal) -- 0 reads 0 bases (size of chimera signal) -- 5109 reads 419488 bases (size of subread signal) --
-- TRIMMING:


-- 2258 reads 23632933 bases (trimmed from the 5' end of the read) -- 2515 reads 26453072 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]

-- In gatekeeper store 'unitigging/hc1_hybrid_cov100.gkpStore': -- Found 7826 reads. -- Found 141153142 bases (56.68 times coverage).

-- Read length histogram (one '*' equals 11.61 reads): -- 0 999 0 -- 1000 1999 100 **** -- 2000 2999 42 -- 3000 3999 30 -- 4000 4999 33 -- 5000 5999 31 -- 6000 6999 26 -- 7000 7999 36 -- 8000 8999 441 ** -- 9000 9999 413 -- 10000 10999 387 ***** -- 11000 11999 327 **** -- 12000 12999 122 ** -- 13000 13999 199 * -- 14000 14999 331 **** -- 15000 15999 443 ** -- 16000 16999 763 ***** -- 17000 17999 813 ** -- 18000 18999 529 * -- 19000 19999 302 ** -- 20000 20999 246 *** -- 21000 21999 194 **** -- 22000 22999 127 ** -- 23000 23999 123 ** -- 24000 24999 184 * -- 25000 25999 281 **** -- 26000 26999 240 **** -- 27000 27999 194 **** -- 28000 28999 134 -- 29000 29999 139 -- 30000 30999 93 **** -- 31000 31999 108 * -- 32000 32999 104 ** -- 33000 33999 88 * -- 34000 34999 72 ** -- 35000 35999 48 ** -- 36000 36999 34 -- 37000 37999 14 -- 38000 38999 13 -- 39000 39999 9 -- 40000 40999 4 -- 41000 41999 3 -- 42000 42999 1 -- 43000 43999 1 -- 44000 44999 0 -- 45000 45999 2 -- 46000 46999 1 -- 47000 47999 1

[UNITIGGING/MERS]

-- 22-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 10093919 *****--> 0.7205 0.0716 -- 2- 2 898914 ** 0.7847 0.0843 -- 3- 4 406953 * 0.8050 0.0904 -- 5- 7 113768 **** 0.8181 0.0961 -- 8- 11 30595 0.8227 0.0991 -- 12- 16 34857 0.8244 0.1008 -- 17- 22 61921 0.8271 0.1048 -- 23- 29 179269 * 0.8323 0.1157 -- 30- 37 368195 **** 0.8461 0.1532 -- 38- 46 562333 *** 0.8743 0.2506 -- 47- 56 542176 ** 0.9143 0.4217 -- 57- 67 401443 *** 0.9522 0.6185 -- 68- 79 209573 **** 0.9793 0.7865 -- 80- 92 75165 * 0.9934 0.8897 -- 93- 106 17688 * 0.9980 0.9288 -- 107- 121 5882 0.9992 0.9405 -- 122- 137 2967 0.9996 0.9449 -- 138- 154 393 0.9998 0.9473 -- 155- 172 145 0.9998 0.9477 -- 173- 191 98 0.9998 0.9479 -- 192- 211 94 0.9998 0.9480 -- 212- 232 56 0.9998 0.9481 -- 233- 254 29 0.9998 0.9482 -- 255- 277 66 0.9998 0.9483 -- 278- 301 42 0.9998 0.9484 -- 302- 326 31 0.9998 0.9485 -- 327- 352 43 0.9998 0.9485 -- 353- 379 79 0.9998 0.9486 -- 380- 407 52 0.9998 0.9488 -- 408- 436 57 0.9998 0.9490 -- 437- 466 18 0.9998 0.9492 -- 467- 497 19 0.9998 0.9492 -- 498- 529 11 0.9998 0.9493 -- 530- 562 30 0.9998 0.9493 -- 563- 596 29 0.9998 0.9494 -- 597- 631 15 0.9998 0.9496 -- 632- 667 3 0.9998 0.9496 -- 668- 704 5 0.9998 0.9496 -- 705- 742 9 0.9998 0.9497 -- 743- 781 12 0.9998 0.9497 -- 782- 821 6 0.9998 0.9498

-- 843946 (max occurrences) -- 130894877 (total mers, non-unique) -- 3915192 (distinct mers, non-unique) -- 10093919 (unique mers)

[UNITIGGING/OVERLAPS] -- category reads % read length feature size or coverage analysis


-- middle-missing 489 6.25 29070.37 +- 5639.63 3275.87 +- 3732.77 (bad trimming) -- middle-hump 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (bad trimming) -- no-5-prime 1 0.01 39957.00 +- 0.00 16246.00 +- 0.00 (bad trimming) -- no-3-prime 1 0.01 27989.00 +- 0.00 1467.00 +- 0.00 (bad trimming) --
-- low-coverage 11 0.14 18916.27 +- 9142.20 4.71 +- 2.11 (easy to assemble, potential for lower quality consensus) -- unique 2968 37.92 13454.04 +- 4834.91 23.83 +- 8.89 (easy to assemble, perfect, yay) -- repeat-cont 192 2.45 18707.28 +- 12170.97 248.06 +- 97.64 (potential for consensus errors, no impact on assembly) -- repeat-dove 25 0.32 37392.24 +- 2983.29 221.19 +- 80.86 (hard to assemble, likely won't assemble correctly or even at all) --
-- span-repeat 3590 45.87 20157.09 +- 5795.09 7406.85 +- 5937.53 (read spans a large repeat, usually easy to assemble) -- uniq-repeat-cont 276 3.53 12146.35 +- 6426.16 (should be uniquely placed, low potential for consensus errors, no impact on assembly) -- uniq-repeat-dove 252 3.22 24134.07 +- 7879.99 (will end contigs, potential to misassemble) -- uniq-anchor 21 0.27 19304.67 +- 11838.94 4755.52 +- 3217.80 (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT] -- No report available.

[UNITIGGING/CONTIGS] -- Found, in version 1, after unitig construction: -- contigs: 67 sequences, total length 2626599 bp (including 6 repeats of total length 197551 bp). -- bubbles: 0 sequences, total length 0 bp. -- unassembled: 5621 sequences, total length 109001061 bp.

-- Contig sizes based on genome size -- -- NG (bp) LG (contigs) sum (bp)


-- 10 75162 3 266606 -- 20 64125 7 533093 -- 30 54400 11 770426 -- 40 49697 16 1024873 -- 50 45684 21 1262405 -- 60 41653 27 1522313 -- 70 35424 33 1749108 -- 80 31548 41 2015962 -- 90 27723 49 2252104 -- 100 19148 60 2505107

[UNITIGGING/CONSENSUS] -- Found, in version 2, after consensus generation: -- contigs: 67 sequences, total length 2696826 bp (including 6 repeats of total length 198541 bp). -- bubbles: 0 sequences, total length 0 bp. -- unassembled: 5621 sequences, total length 109152481 bp.

-- Contig sizes based on genome size -- -- NG (bp) LG (contigs) sum (bp)


-- 10 83110 3 286850 -- 20 70479 6 506173 -- 30 60435 10 763059 -- 40 53599 15 1038577 -- 50 46099 20 1279489 -- 60 42838 25 1501972 -- 70 36509 32 1775771 -- 80 33939 39 2020245 -- 90 27940 47 2258007 -- 100 22568 56 2490408

2. 
```[CORRECTION/READS]
--
-- In gatekeeper store 'correction/hc1_pacbio_sensitive.gkpStore':
--   Found 268730 reads.
--   Found 4496312349 bases (1805.74 times coverage).
--
--   Read length histogram (one '*' equals 181.77 reads):
--        0    999      0 
--     1000   1999  10006 *******************************************************
--     2000   2999   8558 ***********************************************
--     3000   3999   8425 **********************************************
--     4000   4999   9131 **************************************************
--     5000   5999   9435 ***************************************************
--     6000   6999   9758 *****************************************************
--     7000   7999   9955 ******************************************************
--     8000   8999  11593 ***************************************************************
--     9000   9999  12724 **********************************************************************
--    10000  10999  12366 ********************************************************************
--    11000  11999  12019 ******************************************************************
--    12000  12999  11419 **************************************************************
--    13000  13999  10293 ********************************************************
--    14000  14999   9721 *****************************************************
--    15000  15999   9082 *************************************************
--    16000  16999   8642 ***********************************************
--    17000  17999   8058 ********************************************
--    18000  18999   7603 *****************************************
--    19000  19999   6879 *************************************
--    20000  20999   6306 **********************************
--    21000  21999   5792 *******************************
--    22000  22999   5077 ***************************
--    23000  23999   4677 *************************
--    24000  24999   4210 ***********************
--    25000  25999   3856 *********************
--    26000  26999   3625 *******************
--    27000  27999   3269 *****************
--    28000  28999   3073 ****************
--    29000  29999   2939 ****************
--    30000  30999   2787 ***************
--    31000  31999   2692 **************
--    32000  32999   2727 ***************
--    33000  33999   2764 ***************
--    34000  34999   2771 ***************
--    35000  35999   2845 ***************
--    36000  36999   2821 ***************
--    37000  37999   2600 **************
--    38000  38999   2398 *************
--    39000  39999   2174 ***********
--    40000  40999   1961 **********
--    41000  41999   1770 *********
--    42000  42999   1513 ********
--    43000  43999   1328 *******
--    44000  44999   1096 ******
--    45000  45999    920 *****
--    46000  46999    798 ****
--    47000  47999    714 ***
--    48000  48999    592 ***
--    49000  49999    493 **
--    50000  50999    438 **
--    51000  51999    376 **
--    52000  52999    276 *
--    53000  53999    247 *
--    54000  54999    212 *
--    55000  55999    152 
--    56000  56999    149 
--    57000  57999    121 
--    58000  58999    108 
--    59000  59999     95 
--    60000  60999     76 
--    61000  61999     57 
--    62000  62999     36 
--    63000  63999     35 
--    64000  64999     23 
--    65000  65999     23 
--    66000  66999     15 
--    67000  67999     11 
--    68000  68999      7 
--    69000  69999      6 
--    70000  70999      5 
--    71000  71999      2 
--    72000  72999      1 
--    73000  73999      1 
--    74000  74999      1 
--    75000  75999      1 
--    76000  76999      0 
--    77000  77999      0 
--    78000  78999      0 
--    79000  79999      0 
--    80000  80999      0 
--    81000  81999      0 
--    82000  82999      0 
--    83000  83999      0 
--    84000  84999      0 
--    85000  85999      0 
--    86000  86999      0 
--    87000  87999      0 
--    88000  88999      0 
--    89000  89999      0 
--    90000  90999      0 
--    91000  91999      0 
--    92000  92999      0 
--    93000  93999      0 
--    94000  94999      0 
--    95000  95999      0 
--    96000  96999      0 
--    97000  97999      0 
--    98000  98999      0 
--    99000  99999      1

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 594082432 *******************************************************************--> 0.3924 0.1322
--       2-     2 383772344 ********************************************************************** 0.6459 0.3031
--       3-     4 340411799 **************************************************************         0.7908 0.4496
--       5-     7 129208131 ***********************                                                0.9152 0.6323
--       8-    11  37410055 ******                                                                 0.9660 0.7466
--      12-    16  13968007 **                                                                     0.9836 0.8068
--      17-    22   6329618 *                                                                      0.9911 0.8437
--      23-    29   3098351                                                                        0.9947 0.8681
--      30-    37   1534876                                                                        0.9965 0.8843
--      38-    46    748727                                                                        0.9974 0.8947
--      47-    56    359020                                                                        0.9978 0.9012
--      57-    67    178536                                                                        0.9981 0.9050
--      68-    79    110558                                                                        0.9982 0.9073
--      80-    92    127503                                                                        0.9982 0.9091
--      93-   106    234077                                                                        0.9983 0.9117
--     107-   121    411794                                                                        0.9985 0.9171
--     122-   137    568842                                                                        0.9988 0.9280
--     138-   154    567957                                                                        0.9992 0.9447
--     155-   172    392933                                                                        0.9995 0.9629
--     173-   191    188465                                                                        0.9998 0.9767
--     192-   211     70309                                                                        0.9999 0.9840
--     212-   232     27625                                                                        0.9999 0.9870
--     233-   254     15821                                                                        1.0000 0.9883
--     255-   277     11747                                                                        1.0000 0.9891
--     278-   301      8990                                                                        1.0000 0.9898
--     302-   326      6600                                                                        1.0000 0.9904
--     327-   352      4922                                                                        1.0000 0.9908
--     353-   379      3846                                                                        1.0000 0.9912
--     380-   407      3402                                                                        1.0000 0.9915
--     408-   436      2734                                                                        1.0000 0.9918
--     437-   466      2168                                                                        1.0000 0.9921
--     467-   497      1596                                                                        1.0000 0.9923
--     498-   529      1202                                                                        1.0000 0.9924
--     530-   562       913                                                                        1.0000 0.9926
--     563-   596       701                                                                        1.0000 0.9927
--     597-   631       555                                                                        1.0000 0.9928
--     632-   667       476                                                                        1.0000 0.9929
--     668-   704       420                                                                        1.0000 0.9929
--     705-   742       448                                                                        1.0000 0.9930
--     743-   781       474                                                                        1.0000 0.9931
--     782-   821       527                                                                        1.0000 0.9931
--
--     6187952 (max occurrences)
--  3898198967 (total mers, non-unique)
--   919794736 (distinct mers, non-unique)
--   594082432 (unique mers)

[CORRECTION/CORRECTIONS]
--
-- Reads to be corrected:
--   2016 reads longer than 51287 bp
--   101621194 bp
-- Expected corrected reads:
--   2016 reads
--   99623876 bp
--   46660 bp minimum length
--   49417 bp mean length
--   58619 bp n50 length

[TRIMMING/READS]
--
-- In gatekeeper store 'trimming/hc1_pacbio_sensitive.gkpStore':
--   Found 2008 reads.
--   Found 100389644 bases (40.31 times coverage).
--
--   Read length histogram (one '*' equals 4.88 reads):
--        0    999      0 
--     1000   1999      0 
--     2000   2999      0 
--     3000   3999      0 
--     4000   4999      0 
--     5000   5999      0 
--     6000   6999      0 
--     7000   7999      0 
--     8000   8999      0 
--     9000   9999      0 
--    10000  10999      0 
--    11000  11999      0 
--    12000  12999      0 
--    13000  13999      0 
--    14000  14999      0 
--    15000  15999      0 
--    16000  16999      0 
--    17000  17999      0 
--    18000  18999      0 
--    19000  19999      0 
--    20000  20999      0 
--    21000  21999      0 
--    22000  22999      0 
--    23000  23999      0 
--    24000  24999      0 
--    25000  25999      0 
--    26000  26999      0 
--    27000  27999      3 
--    28000  28999      1 
--    29000  29999      1 
--    30000  30999      1 
--    31000  31999      1 
--    32000  32999      1 
--    33000  33999      0 
--    34000  34999      1 
--    35000  35999      0 
--    36000  36999      0 
--    37000  37999      0 
--    38000  38999      1 
--    39000  39999      0 
--    40000  40999      0 
--    41000  41999      0 
--    42000  42999      0 
--    43000  43999      0 
--    44000  44999      6 *
--    45000  45999     51 **********
--    46000  46999    230 ***********************************************
--    47000  47999    342 **********************************************************************
--    48000  48999    302 *************************************************************
--    49000  49999    243 *************************************************
--    50000  50999    214 *******************************************
--    51000  51999    168 **********************************
--    52000  52999    114 ***********************
--    53000  53999     84 *****************
--    54000  54999     68 *************
--    55000  55999     46 *********
--    56000  56999     45 *********
--    57000  57999     23 ****
--    58000  58999     15 ***
--    59000  59999     13 **
--    60000  60999     13 **
--    61000  61999      9 *
--    62000  62999      2 
--    63000  63999      3 
--    64000  64999      0 
--    65000  65999      4 
--    66000  66999      0 
--    67000  67999      0 
--    68000  68999      0 
--    69000  69999      0 
--    70000  70999      1 
--    71000  71999      0 
--    72000  72999      0 
--    73000  73999      0 
--    74000  74999      1 
--    75000  75999      0 
--    76000  76999      0 
--    77000  77999      0 
--    78000  78999      0 
--    79000  79999      0 
--    80000  80999      0 
--    81000  81999      0 
--    82000  82999      0 
--    83000  83999      0 
--    84000  84999      0 
--    85000  85999      0 
--    86000  86999      0 
--    87000  87999      0 
--    88000  88999      0 
--    89000  89999      0 
--    90000  90999      0 
--    91000  91999      0 
--    92000  92999      0 
--    93000  93999      0 
--    94000  94999      0 
--    95000  95999      0 
--    96000  96999      0 
--    97000  97999      0 
--    98000  98999      0 
--    99000  99999      1

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1  87093486 *******************************************************************--> 0.9712 0.8679
--       2-     2    659337 ***********************************************************            0.9786 0.8811
--       3-     4    773587 *********************************************************************  0.9831 0.8933
--       5-     7    777620 ********************************************************************** 0.9907 0.9235
--       8-    11    318350 ****************************                                           0.9974 0.9642
--      12-    16     44801 ****                                                                   0.9996 0.9841
--      17-    22      2935                                                                        0.9999 0.9875
--      23-    29      1438                                                                        0.9999 0.9880
--      30-    37       467                                                                        1.0000 0.9883
--      38-    46       341                                                                        1.0000 0.9884
--      47-    56       212                                                                        1.0000 0.9886
--      57-    67       155                                                                        1.0000 0.9887
--      68-    79        62                                                                        1.0000 0.9888
--      80-    92        55                                                                        1.0000 0.9888
--      93-   106        53                                                                        1.0000 0.9888
--     107-   121        65                                                                        1.0000 0.9889
--     122-   137        33                                                                        1.0000 0.9890
--     138-   154        23                                                                        1.0000 0.9890
--     155-   172        37                                                                        1.0000 0.9890
--     173-   191        54                                                                        1.0000 0.9891
--     192-   211       131                                                                        1.0000 0.9892
--     212-   232       191                                                                        1.0000 0.9895
--     233-   254       378                                                                        1.0000 0.9899
--     255-   277       617                                                                        1.0000 0.9909
--     278-   301       521                                                                        1.0000 0.9925
--     302-   326       142                                                                        0.0000 0.9940
--     327-   352        11                                                                        1.0000 0.9944
--     353-   379        12                                                                        0.0000 0.9944
--     380-   407         4                                                                        0.0000 0.9945
--     408-   436         5                                                                        0.0000 0.9945
--     437-   466         8                                                                        0.0000 0.9945
--     467-   497        12                                                                        0.0000 0.9945
--     498-   529         1                                                                        0.0000 0.9946
--     530-   562         0                                                                        0.0000 0.0000
--     563-   596         0                                                                        0.0000 0.0000
--     597-   631         0                                                                        0.0000 0.0000
--     632-   667         0                                                                        0.0000 0.0000
--     668-   704         0                                                                        0.0000 0.0000
--     705-   742         2                                                                        0.0000 0.9946
--     743-   781        13                                                                        0.0000 0.9946
--     782-   821        18                                                                        0.0000 0.9947
--
--      202101 (max occurrences)
--    13253990 (total mers, non-unique)
--     2581771 (distinct mers, non-unique)
--    87093486 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0450    (use overlaps at or below this fraction error)
--        1    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        1    (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--    2008 reads    100389644 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--    1191 reads      8992221 bases (trimmed reads output)
--       0 reads            0 bases (reads with no change, kept as is)
--     733 reads     37773659 bases (reads with no overlaps, deleted)
--      84 reads      4216310 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--    1189 reads     40631043 bases (bases trimmed from the 5' end of a read)
--    1188 reads      8776411 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0450    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--    1191 reads     58399675 bases (reads processed)
--     817 reads     41989969 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--       9 reads       443001 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--    1182 reads     57956674 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--       0 reads            0 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--       0 reads            0 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--       0 reads            0 bases (trimmed from the 5' end of the read)
--       0 reads            0 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In gatekeeper store 'unitigging/hc1_pacbio_sensitive.gkpStore':
--   Found 1191 reads.
--   Found 8992221 bases (3.61 times coverage).
--
--   Read length histogram (one '*' equals 3.8 reads):
--        0    999      0 
--     1000   1999     75 *******************
--     2000   2999     77 ********************
--     3000   3999     80 *********************
--     4000   4999     68 *****************
--     5000   5999     81 *********************
--     6000   6999     90 ***********************
--     7000   7999    137 ************************************
--     8000   8999    266 **********************************************************************
--     9000   9999    161 ******************************************
--    10000  10999     95 *************************
--    11000  11999     25 ******
--    12000  12999      2 
--    13000  13999      1 
--    14000  14999      0 
--    15000  15999      1 
--    16000  16999      1 
--    17000  17999      1 
--    18000  18999      0 
--    19000  19999      0 
--    20000  20999      0 
--    21000  21999      2 
--    22000  22999      0 
--    23000  23999      3 
--    24000  24999      1 
--    25000  25999      1 
--    26000  26999      1 
--    27000  27999      3 
--    28000  28999      2 
--    29000  29999      2 
--    30000  30999      2 
--    31000  31999      2 
--    32000  32999      5 *
--    33000  33999      3 
--    34000  34999      1 
--    35000  35999      0 
--    36000  36999      0 
--    37000  37999      0 
--    38000  38999      1 
--    39000  39999      0 
--    40000  40999      0 
--    41000  41999      0 
--    42000  42999      0 
--    43000  43999      1

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1   1544406 *******************************************************************--> 0.4571 0.1722
--       2-     2    555836 *********************************************                          0.6216 0.2962
--       3-     4    854810 ********************************************************************** 0.7645 0.4577
--       5-     7    390426 *******************************                                        0.9373 0.7417
--       8-    11     27717 **                                                                     0.9960 0.8846
--      12-    16      1839                                                                        0.9985 0.8938
--      17-    22       550                                                                        0.9990 0.8965
--      23-    29       219                                                                        0.9991 0.8973
--      30-    37       191                                                                        0.9992 0.8980
--      38-    46       140                                                                        0.9992 0.8986
--      47-    56        89                                                                        0.9993 0.8992
--      57-    67        25                                                                        0.9993 0.8997
--      68-    79        28                                                                        0.9993 0.8999
--      80-    92        39                                                                        0.9993 0.9001
--      93-   106        58                                                                        0.9993 0.9005
--     107-   121        39                                                                        0.9993 0.9011
--     122-   137        23                                                                        0.9994 0.9016
--     138-   154        38                                                                        0.9994 0.9020
--     155-   172        91                                                                        0.9994 0.9026
--     173-   191       127                                                                        0.9994 0.9044
--     192-   211       360                                                                        0.9994 0.9071
--     212-   232       795                                                                        0.9996 0.9159
--     233-   254       564                                                                        0.9998 0.9353
--     255-   277        27                                                                        1.0000 0.9502
--     278-   301         8                                                                        1.0000 0.9506
--     302-   326        17                                                                        1.0000 0.9509
--     327-   352         1                                                                        1.0000 0.9515
--     353-   379         0                                                                        0.0000 0.0000
--     380-   407         0                                                                        0.0000 0.0000
--     408-   436         0                                                                        0.0000 0.0000
--     437-   466         0                                                                        0.0000 0.0000
--     467-   497        38                                                                        1.0000 0.9516
--     498-   529         6                                                                        1.0000 0.9536
--     530-   562         0                                                                        0.0000 0.0000
--     563-   596         0                                                                        0.0000 0.0000
--     597-   631         0                                                                        0.0000 0.0000
--     632-   667         0                                                                        0.0000 0.0000
--     668-   704         0                                                                        0.0000 0.0000
--     705-   742         6                                                                        1.0000 0.9540
--     743-   781        16                                                                        1.0000 0.9545
--     782-   821         0                                                                        0.0000 0.0000
--
--      183362 (max occurrences)
--     7422804 (total mers, non-unique)
--     1834169 (distinct mers, non-unique)
--     1544406 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing          7    0.59    18254.86 +- 11356.40      5351.71 +- 6507.86    (bad trimming)
--   middle-hump            18    1.51     6376.72 +- 2385.86        942.72 +- 861.48     (bad trimming)
--   no-5-prime             83    6.97     8575.18 +- 4361.05        455.48 +- 1944.13    (bad trimming)
--   no-3-prime             72    6.05     8759.76 +- 4063.66        482.62 +- 1569.11    (bad trimming)
--   
--   low-coverage          913   76.66     7034.87 +- 3825.97          3.51 +- 1.58       (easy to assemble, potential for lower quality consensus)
--   unique                 23    1.93    11475.52 +- 11090.22        23.06 +- 9.34       (easy to assemble, perfect, yay)
--   repeat-cont             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (potential for consensus errors, no impact on assembly)
--   repeat-dove             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat            52    4.37     9840.46 +- 3559.10       1498.81 +- 1728.31    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont        4    0.34    10502.00 +- 7747.99                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove        6    0.50    23562.50 +- 10872.09                            (will end contigs, potential to misassemble)
--   uniq-anchor             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      9 sequences, total length 247178 bp (including 0 repeats of total length 0 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  542 sequences, total length 4690249 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      9 sequences, total length 246496 bp (including 0 repeats of total length 0 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  542 sequences, total length 4685951 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--
  1. 
    [CORRECTION/READS]
    --
    -- In gatekeeper store 'correction/hc1_pacbio_smash.gkpStore':
    --   Found 268730 reads.
    --   Found 4496312349 bases (1805.74 times coverage).
    --
    --   Read length histogram (one '*' equals 181.77 reads):
    --        0    999      0 
    --     1000   1999  10006 *******************************************************
    --     2000   2999   8558 ***********************************************
    --     3000   3999   8425 **********************************************
    --     4000   4999   9131 **************************************************
    --     5000   5999   9435 ***************************************************
    --     6000   6999   9758 *****************************************************
    --     7000   7999   9955 ******************************************************
    --     8000   8999  11593 ***************************************************************
    --     9000   9999  12724 **********************************************************************
    --    10000  10999  12366 ********************************************************************
    --    11000  11999  12019 ******************************************************************
    --    12000  12999  11419 **************************************************************
    --    13000  13999  10293 ********************************************************
    --    14000  14999   9721 *****************************************************
    --    15000  15999   9082 *************************************************
    --    16000  16999   8642 ***********************************************
    --    17000  17999   8058 ********************************************
    --    18000  18999   7603 *****************************************
    --    19000  19999   6879 *************************************
    --    20000  20999   6306 **********************************
    --    21000  21999   5792 *******************************
    --    22000  22999   5077 ***************************
    --    23000  23999   4677 *************************
    --    24000  24999   4210 ***********************
    --    25000  25999   3856 *********************
    --    26000  26999   3625 *******************
    --    27000  27999   3269 *****************
    --    28000  28999   3073 ****************
    --    29000  29999   2939 ****************
    --    30000  30999   2787 ***************
    --    31000  31999   2692 **************
    --    32000  32999   2727 ***************
    --    33000  33999   2764 ***************
    --    34000  34999   2771 ***************
    --    35000  35999   2845 ***************
    --    36000  36999   2821 ***************
    --    37000  37999   2600 **************
    --    38000  38999   2398 *************
    --    39000  39999   2174 ***********
    --    40000  40999   1961 **********
    --    41000  41999   1770 *********
    --    42000  42999   1513 ********
    --    43000  43999   1328 *******
    --    44000  44999   1096 ******
    --    45000  45999    920 *****
    --    46000  46999    798 ****
    --    47000  47999    714 ***
    --    48000  48999    592 ***
    --    49000  49999    493 **
    --    50000  50999    438 **
    --    51000  51999    376 **
    --    52000  52999    276 *
    --    53000  53999    247 *
    --    54000  54999    212 *
    --    55000  55999    152 
    --    56000  56999    149 
    --    57000  57999    121 
    --    58000  58999    108 
    --    59000  59999     95 
    --    60000  60999     76 
    --    61000  61999     57 
    --    62000  62999     36 
    --    63000  63999     35 
    --    64000  64999     23 
    --    65000  65999     23 
    --    66000  66999     15 
    --    67000  67999     11 
    --    68000  68999      7 
    --    69000  69999      6 
    --    70000  70999      5 
    --    71000  71999      2 
    --    72000  72999      1 
    --    73000  73999      1 
    --    74000  74999      1 
    --    75000  75999      1 
    --    76000  76999      0 
    --    77000  77999      0 
    --    78000  78999      0 
    --    79000  79999      0 
    --    80000  80999      0 
    --    81000  81999      0 
    --    82000  82999      0 
    --    83000  83999      0 
    --    84000  84999      0 
    --    85000  85999      0 
    --    86000  86999      0 
    --    87000  87999      0 
    --    88000  88999      0 
    --    89000  89999      0 
    --    90000  90999      0 
    --    91000  91999      0 
    --    92000  92999      0 
    --    93000  93999      0 
    --    94000  94999      0 
    --    95000  95999      0 
    --    96000  96999      0 
    --    97000  97999      0 
    --    98000  98999      0 
    --    99000  99999      1

[CORRECTION/MERS]

-- 16-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 594082432 *****--> 0.3924 0.1322 -- 2- 2 383772344 ** 0.6459 0.3031 -- 3- 4 340411799 ** 0.7908 0.4496 -- 5- 7 129208131 * 0.9152 0.6323 -- 8- 11 37410055 ** 0.9660 0.7466 -- 12- 16 13968007 * 0.9836 0.8068 -- 17- 22 6329618 0.9911 0.8437 -- 23- 29 3098351 0.9947 0.8681 -- 30- 37 1534876 0.9965 0.8843 -- 38- 46 748727 0.9974 0.8947 -- 47- 56 359020 0.9978 0.9012 -- 57- 67 178536 0.9981 0.9050 -- 68- 79 110558 0.9982 0.9073 -- 80- 92 127503 0.9982 0.9091 -- 93- 106 234077 0.9983 0.9117 -- 107- 121 411794 0.9985 0.9171 -- 122- 137 568842 0.9988 0.9280 -- 138- 154 567957 0.9992 0.9447 -- 155- 172 392933 0.9995 0.9629 -- 173- 191 188465 0.9998 0.9767 -- 192- 211 70309 0.9999 0.9840 -- 212- 232 27625 0.9999 0.9870 -- 233- 254 15821 1.0000 0.9883 -- 255- 277 11747 1.0000 0.9891 -- 278- 301 8990 1.0000 0.9898 -- 302- 326 6600 1.0000 0.9904 -- 327- 352 4922 1.0000 0.9908 -- 353- 379 3846 1.0000 0.9912 -- 380- 407 3402 1.0000 0.9915 -- 408- 436 2734 1.0000 0.9918 -- 437- 466 2168 1.0000 0.9921 -- 467- 497 1596 1.0000 0.9923 -- 498- 529 1202 1.0000 0.9924 -- 530- 562 913 1.0000 0.9926 -- 563- 596 701 1.0000 0.9927 -- 597- 631 555 1.0000 0.9928 -- 632- 667 476 1.0000 0.9929 -- 668- 704 420 1.0000 0.9929 -- 705- 742 448 1.0000 0.9930 -- 743- 781 474 1.0000 0.9931 -- 782- 821 527 1.0000 0.9931

-- 6187952 (max occurrences) -- 3898198967 (total mers, non-unique) -- 919794736 (distinct mers, non-unique) -- 594082432 (unique mers)

[CORRECTION/CORRECTIONS]

-- Reads to be corrected: -- 57007 reads longer than 30840 bp -- 1601121265 bp -- Expected corrected reads: -- 57007 reads -- 1245006954 bp -- 11879 bp minimum length -- 21840 bp mean length -- 46654 bp n50 length

[TRIMMING/READS]

-- In gatekeeper store 'trimming/hc1_pacbio_smash.gkpStore': -- Found 62585 reads. -- Found 953598509 bases (382.97 times coverage).

-- Read length histogram (one '*' equals 57.91 reads): -- 0 999 0 -- 1000 1999 1669 **** -- 2000 2999 1678 **** -- 3000 3999 1674 **** -- 4000 4999 1672 **** -- 5000 5999 1767 ** -- 6000 6999 1854 **** -- 7000 7999 2040 * -- 8000 8999 2306 ***** -- 9000 9999 2470 ** -- 10000 10999 2888 * -- 11000 11999 4054 ** -- 12000 12999 3517 **** -- 13000 13999 3225 *** -- 14000 14999 3161 ** -- 15000 15999 2779 ** -- 16000 16999 2657 -- 17000 17999 2396 ***** -- 18000 18999 2124 **** -- 19000 19999 1716 * -- 20000 20999 1678 **** -- 21000 21999 1664 **** -- 22000 22999 1667 **** -- 23000 23999 1573 ** -- 24000 24999 1480 -- 25000 25999 1412 **** -- 26000 26999 1256 *** -- 27000 27999 1052 ** -- 28000 28999 902 * -- 29000 29999 852 ** -- 30000 30999 718 **** -- 31000 31999 683 -- 32000 32999 643 -- 33000 33999 501 **** -- 34000 34999 367 ** -- 35000 35999 241 ** -- 36000 36999 133 -- 37000 37999 52 -- 38000 38999 26 -- 39000 39999 17 -- 40000 40999 8 -- 41000 41999 4 -- 42000 42999 1 -- 43000 43999 0 -- 44000 44999 1 -- 45000 45999 0 -- 46000 46999 5 -- 47000 47999 2

[TRIMMING/MERS]

-- 22-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 55231268 *****--> 0.8492 0.0580 -- 2- 2 3685405 ** 0.9059 0.0657 -- 3- 4 2117182 **** 0.9274 0.0701 -- 5- 7 882823 **** 0.9449 0.0754 -- 8- 11 365315 ** 0.9540 0.0796 -- 12- 16 161397 * 0.9583 0.0826 -- 17- 22 73227 0.9604 0.0846 -- 23- 29 34139 0.9613 0.0859 -- 30- 37 16278 0.9618 0.0868 -- 38- 46 8318 0.9620 0.0873 -- 47- 56 4738 0.9621 0.0876 -- 57- 67 2955 0.9622 0.0879 -- 68- 79 1805 0.9622 0.0881 -- 80- 92 1434 0.9623 0.0882 -- 93- 106 964 0.9623 0.0883 -- 107- 121 926 0.9623 0.0884 -- 122- 137 712 0.9623 0.0885 -- 138- 154 2170 0.9623 0.0886 -- 155- 172 6610 0.9624 0.0890 -- 173- 191 8184 0.9625 0.0902 -- 192- 211 13762 0.9626 0.0918 -- 212- 232 30966 0.9628 0.0947 -- 233- 254 76172 0.9633 0.1024 -- 255- 277 139178 0.9645 0.1226 -- 278- 301 224652 ** 0.9667 0.1628 -- 302- 326 350000 ** 0.9702 0.2328 -- 327- 352 448360 **** 0.9757 0.3503 -- 353- 379 465122 **** 0.9826 0.5111 -- 380- 407 361448 ** 0.9897 0.6896 -- 408- 436 215609 *** 0.9952 0.8367 -- 437- 466 76080 0.9984 0.9294 -- 467- 497 8483 0.9995 0.9635 -- 498- 529 457 0.9996 0.9673 -- 530- 562 716 0.9997 0.9675 -- 563- 596 2053 0.9997 0.9680 -- 597- 631 4324 0.9997 0.9692 -- 632- 667 4047 0.9998 0.9721 -- 668- 704 2073 0.9998 0.9748 -- 705- 742 2174 0.9999 0.9763 -- 743- 781 3307 0.9999 0.9780 -- 782- 821 1127 0.9999 0.9806

-- 5458608 (max occurrences) -- 897052951 (total mers, non-unique) -- 9807648 (distinct mers, non-unique) -- 55231268 (unique mers)

[TRIMMING/TRIMMING] -- PARAMETERS:


-- 1000 (reads trimmed below this many bases are deleted) -- 0.1500 (use overlaps at or below this fraction error) -- 1 (break region if overlap is less than this long, for 'largest covered' algorithm) -- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm) --
-- INPUT READS:


-- 62585 reads 953598509 bases (reads processed) -- 0 reads 0 bases (reads not processed, previously deleted) -- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed) --
-- OUTPUT READS:


-- 54254 reads 797345148 bases (trimmed reads output) -- 8195 reads 116372849 bases (reads with no change, kept as is) -- 89 reads 462733 bases (reads with no overlaps, deleted) -- 47 reads 79304 bases (reads with short trimmed length, deleted) --
-- TRIMMING DETAILS:


-- 38763 reads 23269896 bases (bases trimmed from the 5' end of a read) -- 42441 reads 16068579 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING] -- PARAMETERS:


-- 1000 (reads trimmed below this many bases are deleted) -- 0.1500 (use overlaps at or below this fraction error) -- INPUT READS:


-- 62449 reads 953056472 bases (reads processed) -- 136 reads 542037 bases (reads not processed, previously deleted) -- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed) --
-- PROCESSED:


-- 0 reads 0 bases (no overlaps) -- 0 reads 0 bases (no coverage after adjusting for trimming done already) -- 0 reads 0 bases (processed for chimera) -- 0 reads 0 bases (processed for spur) -- 62449 reads 953056472 bases (processed for subreads) --
-- READS WITH SIGNALS:


-- 0 reads 0 signals (number of 5' spur signal) -- 0 reads 0 signals (number of 3' spur signal) -- 0 reads 0 signals (number of chimera signal) -- 19659 reads 21848 signals (number of subread signal) --
-- SIGNALS:


-- 0 reads 0 bases (size of 5' spur signal) -- 0 reads 0 bases (size of 3' spur signal) -- 0 reads 0 bases (size of chimera signal) -- 21848 reads 6289853 bases (size of subread signal) --
-- TRIMMING:


-- 8938 reads 53461841 bases (trimmed from the 5' end of the read) -- 12082 reads 64183872 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]

-- In gatekeeper store 'unitigging/hc1_pacbio_smash.gkpStore': -- Found 62449 reads. -- Found 796072284 bases (319.7 times coverage).

-- Read length histogram (one '*' equals 81.4 reads): -- 0 999 0 -- 1000 1999 1775 * -- 2000 2999 1881 *** -- 3000 3999 2033 **** -- 4000 4999 2412 * -- 5000 5999 2855 *** -- 6000 6999 3166 ** -- 7000 7999 3519 *** -- 8000 8999 5698 ** -- 9000 9999 3949 **** -- 10000 10999 3475 ** -- 11000 11999 3025 * -- 12000 12999 2020 **** -- 13000 13999 2243 * -- 14000 14999 2434 *** -- 15000 15999 2546 *** -- 16000 16999 2939 **** -- 17000 17999 2778 ** -- 18000 18999 2051 * -- 19000 19999 1491 ** -- 20000 20999 1391 ** -- 21000 21999 1278 -- 22000 22999 1054 **** -- 23000 23999 941 -- 24000 24999 900 -- 25000 25999 911 ** -- 26000 26999 733 -- 27000 27999 595 ** -- 28000 28999 464 -- 29000 29999 437 * -- 30000 30999 342 ** -- 31000 31999 319 * -- 32000 32999 258 -- 33000 33999 206 -- 34000 34999 132 -- 35000 35999 90 * -- 36000 36999 46 -- 37000 37999 19 -- 38000 38999 14 -- 39000 39999 12 -- 40000 40999 6 -- 41000 41999 5 -- 42000 42999 1 -- 43000 43999 1 -- 44000 44999 0 -- 45000 45999 2 -- 46000 46999 1 -- 47000 47999 1

[UNITIGGING/MERS]

-- 22-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 30278566 *****--> 0.7935 0.0381 -- 2- 2 2785278 ** 0.8665 0.0451 -- 3- 4 1568706 * 0.8940 0.0491 -- 5- 7 627265 ***** 0.9156 0.0536 -- 8- 11 248535 ** 0.9265 0.0571 -- 12- 16 106525 0.9314 0.0595 -- 17- 22 46308 0.9337 0.0611 -- 23- 29 20091 0.9347 0.0621 -- 30- 37 10535 0.9352 0.0627 -- 38- 46 5740 0.9354 0.0631 -- 47- 56 3196 0.9356 0.0634 -- 57- 67 1654 0.9357 0.0636 -- 68- 79 1311 0.9357 0.0637 -- 80- 92 1275 0.9357 0.0638 -- 93- 106 1533 0.9358 0.0640 -- 107- 121 6774 0.9358 0.0642 -- 122- 137 10652 0.9360 0.0652 -- 138- 154 10920 0.9363 0.0670 -- 155- 172 23410 0.9366 0.0690 -- 173- 191 56187 0.9372 0.0742 -- 192- 211 91811 0.9387 0.0875 -- 212- 232 150802 * 0.9412 0.1118 -- 233- 254 227477 * 0.9452 0.1550 -- 255- 277 325041 **** 0.9512 0.2258 -- 278- 301 383536 ***** 0.9599 0.3367 -- 302- 326 395184 *** 0.9700 0.4773 -- 327- 352 337569 **** 0.9803 0.6329 -- 353- 379 243903 ** 0.9890 0.7757 -- 380- 407 121281 * 0.9953 0.8859 -- 408- 436 39520 0.9984 0.9439 -- 437- 466 3670 0.9993 0.9636 -- 467- 497 1727 0.9994 0.9655 -- 498- 529 2184 0.9995 0.9665 -- 530- 562 4971 0.9995 0.9680 -- 563- 596 2437 0.9997 0.9714 -- 597- 631 1985 0.9997 0.9731 -- 632- 667 2579 0.9998 0.9747 -- 668- 704 2665 0.9998 0.9768 -- 705- 742 650 0.9999 0.9791 -- 743- 781 182 0.9999 0.9796 -- 782- 821 118 0.9999 0.9798

-- 4625539 (max occurrences) -- 764482287 (total mers, non-unique) -- 7877581 (distinct mers, non-unique) -- 30278566 (unique mers)

[UNITIGGING/OVERLAPS] -- category reads % read length feature size or coverage analysis


-- middle-missing 46 0.07 29493.83 +- 7307.73 1491.52 +- 1824.50 (bad trimming) -- middle-hump 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (bad trimming) -- no-5-prime 1 0.00 41272.00 +- 0.00 17545.00 +- 0.00 (bad trimming) -- no-3-prime 2 0.00 18903.00 +- 7237.95 14211.00 +- 9907.98 (bad trimming) --
-- low-coverage 1 0.00 41272.00 +- 0.00 6.16 +- 1.70 (easy to assemble, potential for lower quality consensus) -- unique 8 0.01 2984.62 +- 1054.95 30.17 +- 9.86 (easy to assemble, perfect, yay) -- repeat-cont 48546 77.74 10090.60 +- 5030.80 173.61 +- 79.98 (potential for consensus errors, no impact on assembly) -- repeat-dove 2835 4.54 19077.12 +- 3669.65 123.21 +- 51.63 (hard to assemble, likely won't assemble correctly or even at all) --
-- span-repeat 31 0.05 21499.13 +- 11354.69 12121.97 +- 7868.00 (read spans a large repeat, usually easy to assemble) -- uniq-repeat-cont 277 0.44 15073.83 +- 7756.64 (should be uniquely placed, low potential for consensus errors, no impact on assembly) -- uniq-repeat-dove 788 1.26 23463.74 +- 6199.11 (will end contigs, potential to misassemble) -- uniq-anchor 9914 15.88 22927.00 +- 6416.16 9132.49 +- 6470.79 (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT] -- No report available.

[UNITIGGING/CONTIGS] -- Found, in version 1, after unitig construction: -- contigs: 774 sequences, total length 28825721 bp (including 180 repeats of total length 10087773 bp). -- bubbles: 0 sequences, total length 0 bp. -- unassembled: 27967 sequences, total length 467652040 bp.

-- Contig sizes based on genome size -- -- NG (bp) LG (contigs) sum (bp)


-- 10 154794 2 338981 -- 20 148422 4 637556 -- 30 145749 5 783305 -- 40 138518 7 1062580 -- 50 130357 9 1328482 -- 60 129479 11 1588102 -- 70 126396 13 1841286 -- 80 124727 15 2090918 -- 90 122469 17 2336194 -- 100 113677 19 2564092 -- 110 108082 21 2783983 -- 120 104973 23 2994243 -- 130 101000 26 3301631 -- 140 100034 28 3502455 -- 150 97748 31 3798249 -- 160 94684 33 3990519 -- 170 92653 36 4269074 -- 180 90201 39 4542453 -- 190 88536 42 4810047 -- 200 87275 44 4985097 -- 210 86877 47 5245906 -- 220 85192 50 5501815 -- 230 82650 53 5750599 -- 240 80918 56 5995944 -- 250 79379 59 6234829 -- 260 77458 63 6546969 -- 270 74500 66 6772693 -- 280 73254 69 6994323 -- 290 71418 73 7283288 -- 300 70495 76 7495508 -- 310 69738 80 7776030 -- 320 68511 83 7982794 -- 330 67627 87 8254692 -- 340 65652 91 8521122 -- 350 64665 94 8716320 -- 360 64072 98 8973481 -- 370 62565 102 9226959 -- 380 61543 106 9474586 -- 390 60067 110 9717579 -- 400 59586 115 10016434 -- 410 57157 119 10250579 -- 420 56041 123 10476983 -- 430 55078 128 10754430 -- 440 53690 132 10970991 -- 450 52979 137 11237718 -- 460 52262 142 11500811 -- 470 51627 146 11707805 -- 480 50988 151 11963946 -- 490 50277 156 12216841 -- 500 49237 161 12464805 -- 510 48306 166 12707304 -- 520 47671 172 12995191 -- 530 46893 177 13231443 -- 540 46407 182 13464206 -- 550 45829 188 13740551 -- 560 45420 193 13968772 -- 570 44801 198 14193995 -- 580 44345 204 14461249 -- 590 43386 210 14723495 -- 600 41842 216 14977929 -- 610 41405 222 15227512 -- 620 40677 228 15472925 -- 630 40161 234 15714745 -- 640 39543 240 15953710 -- 650 38616 246 16187173 -- 660 37911 253 16454724 -- 670 37122 260 16717158 -- 680 36850 266 16939049 -- 690 35966 273 17193483 -- 700 35505 280 17443357 -- 710 34914 287 17689227 -- 720 34494 294 17932188 -- 730 33858 302 18204752 -- 740 33547 309 18440489 -- 750 33049 317 18706818 -- 760 32763 324 18936825 -- 770 32468 332 19197593 -- 780 32108 339 19423522 -- 790 31523 347 19677608 -- 800 31078 355 19927805 -- 810 30665 363 20174634 -- 820 30213 371 20418218 -- 830 29594 380 20686317 -- 840 29183 388 20921491 -- 850 28756 397 21182694 -- 860 28141 406 21437913 -- 870 27777 415 21689494 -- 880 27250 424 21936525 -- 890 26935 433 22180422 -- 900 26516 442 22420465 -- 910 26063 452 22683561 -- 920 25788 461 22916707 -- 930 25301 471 23172223 -- 940 24787 481 23423021 -- 950 24486 491 23669106 -- 960 24030 501 23911583 -- 970 23406 512 24173091 -- 980 22623 522 24403028 -- 990 22053 534 24670704 -- 000 21815 545 24911957 -- 010 21516 556 25150516 -- 020 20860 568 25404468 -- 030 20227 580 25650906 -- 040 19854 593 25911110 -- 050 19684 605 26148260 -- 060 19463 618 26402836 -- 070 18856 631 26651276 -- 080 18384 644 26893024 -- 090 18232 658 27149058 -- 100 17459 672 27397834 -- 110 17102 686 27639666 -- 120 16584 701 27892555 -- 130 16070 717 28152959 -- 140 15404 732 28389103 -- 150 11744 750 28639894

[UNITIGGING/CONSENSUS] -- Found, in version 2, after consensus generation: -- contigs: 774 sequences, total length 28496592 bp (including 180 repeats of total length 9776968 bp). -- bubbles: 0 sequences, total length 0 bp. -- unassembled: 27967 sequences, total length 467802133 bp.

-- Contig sizes based on genome size -- -- NG (bp) LG (contigs) sum (bp)


-- 10 168250 2 364776 -- 20 159014 3 523790 -- 30 148469 5 826603 -- 40 141945 7 1116021 -- 50 141613 8 1257634 -- 60 138364 10 1536890 -- 70 129790 12 1802246 -- 80 122975 14 2050864 -- 90 119871 16 2293387 -- 100 117421 18 2529906 -- 110 114422 20 2758949 -- 120 108874 23 3092649 -- 130 106612 25 3306802 -- 140 104252 27 3516981 -- 150 96660 30 3817342 -- 160 95103 32 4008943 -- 170 94597 35 4292877 -- 180 92707 38 4572494 -- 190 90405 40 4754955 -- 200 89227 43 5023285 -- 210 87188 46 5285676 -- 220 85548 49 5543836 -- 230 81605 52 5792880 -- 240 79794 55 6034323 -- 250 78005 58 6271317 -- 260 76956 61 6503085 -- 270 76390 64 6733181 -- 280 74465 68 7035950 -- 290 72103 71 7254907 -- 300 70339 75 7539006 -- 310 68860 78 7746148 -- 320 68174 82 8019873 -- 330 66658 85 8220335 -- 340 65629 89 8484218 -- 350 63065 93 8739663 -- 360 62757 97 8991038 -- 370 61424 101 9238412 -- 380 59421 105 9477853 -- 390 58960 109 9714284 -- 400 57806 114 10004773 -- 410 56523 118 10231870 -- 420 55112 123 10510643 -- 430 54702 127 10729884 -- 440 53711 132 10999058 -- 450 53146 136 11212430 -- 460 52797 141 11477159 -- 470 52017 146 11738469 -- 480 51786 151 11998103 -- 490 51289 155 12203868 -- 500 50374 160 12458204 -- 510 48871 165 12706892 -- 520 47939 170 12948401 -- 530 46774 176 13232492 -- 540 46291 181 13464642 -- 550 45400 187 13738715 -- 560 44629 192 13963332 -- 570 44051 198 14229562 -- 580 43441 203 14448302 -- 590 42348 209 14704435 -- 600 41952 215 14956968 -- 610 40730 221 15204065 -- 620 40214 227 15446638 -- 630 39983 233 15687168 -- 640 39468 240 15965159 -- 650 39209 246 16201079 -- 660 38447 253 16471122 -- 670 37697 259 16698522 -- 680 36530 266 16956893 -- 690 35997 273 17210759 -- 700 35376 280 17460816 -- 710 34717 287 17705125 -- 720 34249 294 17946699 -- 730 33785 301 18184038 -- 740 32839 309 18450707 -- 750 32456 316 18679070 -- 760 31990 324 18937486 -- 770 31151 332 19189970 -- 780 30739 340 19437541 -- 790 30320 348 19681742 -- 800 29779 356 19922876 -- 810 29507 365 20189728 -- 820 29310 373 20424711 -- 830 28716 382 20685837 -- 840 28212 391 20942015 -- 850 27822 399 21166462 -- 860 27272 408 21414867 -- 870 27069 418 21686349 -- 880 26450 427 21926218 -- 890 26217 436 22163109 -- 900 25775 446 22422589 -- 910 25275 456 22678502 -- 920 24957 466 22930256 -- 930 24377 476 23176038 -- 940 23806 486 23416931 -- 950 22837 497 23673753 -- 960 22101 508 23920590 -- 970 21792 519 24162262 -- 980 21175 531 24419312 -- 990 20691 543 24670202 -- 000 20225 555 24915702 -- 010 19860 567 25155368 -- 020 19691 580 25412156 -- 030 19379 593 25665995 -- 040 18825 606 25913517 -- 050 18436 619 26155142 -- 060 18192 633 26411142 -- 070 17708 646 26644021 -- 080 17307 661 26905976 -- 090 16961 675 27145248 -- 100 16539 690 27397136 -- 110 16150 705 27642026 -- 120 15628 721 27896549 -- 130 14768 737 28141500 -- 140 10555 757 28391723

skoren commented 6 years ago

Your second attempt did not work, it ended up with only 3x, this would imply the longest reads aren't very high quality but that's not necessarily surprising itself. The k-mer distribution from the corrected reads is consistent with a 2-3mb genome size. However, the 22mb assembly when you increase the coverage and the lack of assembly with the 90x coverage is suspicious.

Are you able to share the data, see the FAQ for instructions to send it to us. Otherwise, you could try running mash screen (https://github.com/marbl/Mash) on it as well as running GenomeScope (http://qb.cshl.edu/genomescope/) on the unitigging/0-*/*.histogram outputs. Have you tried mapping the data to a close reference to estimate identity/coverage? That should provide more information on what's going on with the data.

ml3958 commented 6 years ago

Thanks for the reply. I will try to send the data.

Mean while, I ran GenomeScope with the first two column of my unitigging/0-*/*.histogram file (generated with the first attempt that increased corOutCoverage=100 and got 90 X coverage). the result is here http://qb.cshl.edu/genomescope/analysis.php?code=qHGceJKZyhvPegWXXvWt with default GenomeScope settings.

ml3958 commented 6 years ago

I can not put the data to the ftp drive. I successfully connected to the ftp, but when I try to cd to incoming/sergek folder it seems that this folder does not exist.

Can you please double check? thanks

skoren commented 6 years ago

Yep, the FTP is working properly:

% ftp ftp.cbcb.umd.edu
Trying 128.8.132.69...
Connected to ftp.cbcb.umd.edu (128.8.132.69).
220-
220-Welcome to the CBCB FTP Server
220-
220-Please visit, http://www.cbcb.umd.edu
220-for more information.
220-
220 
Name (ftp.cbcb.umd.edu:skoren): anonymous
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd incoming/sergek
250 Directory successfully changed.
ftp> put test
local: test remote: test
227 Entering Passive Mode (128,8,132,69,31,96).
150 Ok to send data.
226 Transfer complete.
ftp> ls
227 Entering Passive Mode (128,8,132,69,31,88).
150 Here comes the directory listing.
226 Transfer done (but failed to open directory).

You can't ls/read the directory but you can run put.

ml3958 commented 6 years ago

Hi Sergey, thanks! I succsufully put the data on the github page named as SRR6331514.fastq.gz. But I think I realized the problem... When I download the data from ftp, the process got disrupted so only 90% of the data were successfully downloaded.

Now I got a much more continuous assembly!

screen shot 2017-12-23 at 2 10 49 pm

Now on to the next step to polish it. Than you!

skoren commented 6 years ago

Ah, I didn't realize this is a public dataset, then I don't need the FTP upload. Getting the raw data from the SRA: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR6331514 shows the following contents:

m160920_005603_42154_c101069572550000001823244402101752_s1_p0.1.bax.h5
m160920_005603_42154_c101069572550000001823244402101752_s1_p0.2.bax.h5
m160920_005603_42154_c101069572550000001823244402101752_s1_p0.3.bax.h5
m160921_141610_42154_c101069832550000001823244402101707_s1_p0.1.bax.h5
m160921_141610_42154_c101069832550000001823244402101707_s1_p0.2.bax.h5
m160921_141610_42154_c101069832550000001823244402101707_s1_p0.3.bax.h5

Each cell is composed of 3 bax.h5 files and an index bas.h5 file. Here, we have two cells mixed together. That would explain the very high coverage and large number of reads for an RS, which this appears to be. I can't imagine a single genome would get sequenced by more than one cell given the coverages per cell here so it is possible this is two different organisms in one SRA sample. That might explain the assembly issues. I launched an assembly using each cell separately and will let you know when I have results.

As general advice, I always prefer to download the raw data from the SRA and convert it to fastq rather than relying on the fastq download because I've had issues in the past with the fastq in SRA not matching what I get by dumping the raw files. In this case, dumping each cell separately (keeping reads >= 500bp and quality 0.75) I ended up with about 1GBp of sequence each and 170k reads, with an average read length of 6kbp and about 350X coverage of a 3mb genome. These stats are in line with an RSII cell. The fastq you uploaded from the SRA has 4.8 Gbp and 330k reads with an average read length of 14.7kb and about 1600X coverage of a 3mb genome. Even combining the two cells separately, I don't come close to those stats. There is no minimum read length but the longest reads are also much longer than my extraction, I'd guess there is no quality filter set either.

skoren commented 6 years ago

As expected, the assembly of the combined cells and each cells individually completed and all produced a single circular contig of 2.5mb. The individual cell assemblies were almost identical so this is the same genome sequenced to extremely high coverage. Here is the bandage plot for the combined assembly:

screen shot 2017-12-23 at 4 35 22 pm

The short sequence is the pacbio control sequence. Happy to share either asm or an asm of both cells combined. Here is the asm report for the combined assembly

[CORRECTION/READS]
--
-- In gatekeeper store './asm.gkpStore':
--   Found 318231 reads.
--   Found 2115476345 bases (705.15 times coverage).
--
--   Read length histogram (one '*' equals 528.14 reads):
--        0    999      0 
--     1000   1999  28753 ******************************************************
--     2000   2999  32394 *************************************************************
--     3000   3999  28830 ******************************************************
--     4000   4999  29109 *******************************************************
--     5000   5999  27310 ***************************************************
--     6000   6999  24941 ***********************************************
--     7000   7999  23603 ********************************************
--     8000   8999  35121 ******************************************************************
--     9000   9999  36970 **********************************************************************
--    10000  10999  21456 ****************************************
--    11000  11999  14074 **************************
--    12000  12999   5919 ***********
--    13000  13999   2553 ****
--    14000  14999   1668 ***
--    15000  15999   1224 **
--    16000  16999   1016 *
--    17000  17999    843 *
--    18000  18999    592 *
--    19000  19999    442 
--    20000  20999    361 
--    21000  21999    277 
--    22000  22999    217 
--    23000  23999    133 
--    24000  24999    112 
--    25000  25999     74 
--    26000  26999     49 
--    27000  27999     41 
--    28000  28999     44 
--    29000  29999     30 
--    30000  30999     20 
--    31000  31999     12 
--    32000  32999     10 
--    33000  33999      9 
--    34000  34999      7 
--    35000  35999      5 
--    36000  36999      3 
--    37000  37999      1 
--    38000  38999      5 
--    39000  39999      2 
--    40000  40999      1

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 479286332 *******************************************************************--> 0.5940 0.2271
--       2-     2 163039707 ********************************************************************** 0.7961 0.3816
--       3-     4  96157434 *****************************************                              0.8768 0.4741
--       5-     7  35128270 ***************                                                        0.9365 0.5736
--       8-    11  15260987 ******                                                                 0.9654 0.6483
--      12-    16   7779987 ***                                                                    0.9804 0.7070
--      17-    22   3950821 *                                                                      0.9885 0.7524
--      23-    29   2020160                                                                        0.9927 0.7851
--      30-    37    987264                                                                        0.9949 0.8077
--      38-    46    460018                                                                        0.9960 0.8219
--      47-    56    201389                                                                        0.9966 0.8303
--      57-    67     90278                                                                        0.9968 0.8348
--      68-    79     62275                                                                        0.9969 0.8372
--      80-    92    108264                                                                        0.9970 0.8395
--      93-   106    245246                                                                        0.9971 0.8442
--     107-   121    445389                                                                        0.9974 0.8566
--     122-   137    581498                                                                        0.9980 0.8815
--     138-   154    526363                                                                        0.9987 0.9175
--     155-   172    320747                                                                        0.9994 0.9533
--     173-   191    131465                                                                        0.9997 0.9771
--     192-   211     43003                                                                        0.9999 0.9877
--     212-   232     16234                                                                        0.9999 0.9915
--     233-   254      9989                                                                        1.0000 0.9932
--     255-   277      7522                                                                        1.0000 0.9943
--     278-   301      5586                                                                        1.0000 0.9953
--     302-   326      3642                                                                        1.0000 0.9960
--     327-   352      1969                                                                        1.0000 0.9965
--     353-   379       970                                                                        1.0000 0.9969
--     380-   407       607                                                                        1.0000 0.9970
--     408-   436       463                                                                        1.0000 0.9971
--     437-   466       337                                                                        1.0000 0.9972
--     467-   497       235                                                                        1.0000 0.9973
--     498-   529       220                                                                        1.0000 0.9973
--     530-   562       131                                                                        1.0000 0.9974
--     563-   596       125                                                                        1.0000 0.9974
--     597-   631       141                                                                        1.0000 0.9975
--     632-   667       131                                                                        1.0000 0.9975
--     668-   704       163                                                                        1.0000 0.9975
--     705-   742       159                                                                        1.0000 0.9976
--     743-   781       142                                                                        1.0000 0.9977
--     782-   821       213                                                                        1.0000 0.9977
--
--      825624 (max occurrences)
--  1631416548 (total mers, non-unique)
--   327591833 (distinct mers, non-unique)
--   479286332 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads             249087         69144
--   Number of Bases         1710355675     334805902
--   Coverage                   570.119       111.602
--   Median                        6952          4081
--   Mean                          6866          4842
--   N50                           8873          7820
--   Minimum                       1000             0
--   Maximum                      40147         32203
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads             183746           9486          9486          38172         38172
--   Number of Bases         1448344402      123115677     120004745      176499721     126411875
--   Coverage                   482.781         41.039        40.002         58.833        42.137
--   Median                        8462          11926         11823           4358          2927
--   Mean                          7882          12978         12650           4623          3311
--   N50                           9180          12026         11891           5497          4445
--   Minimum                       1000          11276         11275           1001           501
--   Maximum                      40147          40147         40134          27227         11259
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads             270573        270573
--   Number of Bases         1745546179     927247764
--   Coverage                   581.849       309.083
--   Median                        6635           221
--   Mean                          6451          3426
--   N50                           8790          8918
--   Minimum                          0             0
--   Maximum                      36176         11274
--   
--   Maximum Memory           942604106

[TRIMMING/READS]
--
-- In gatekeeper store './asm.gkpStore':
--   Found 47576 reads.
--   Found 224759402 bases (74.91 times coverage).
--
--   Read length histogram (one '*' equals 142.95 reads):
--        0    999   4330 ******************************
--     1000   1999  10007 **********************************************************************
--     2000   2999   6393 ********************************************
--     3000   3999   5729 ****************************************
--     4000   4999   4948 **********************************
--     5000   5999   3791 **************************
--     6000   6999   2239 ***************
--     7000   7999    833 *****
--     8000   8999    385 **
--     9000   9999    385 **
--    10000  10999   2714 ******************
--    11000  11999   4857 *********************************
--    12000  12999    264 *
--    13000  13999     83 
--    14000  14999     79 
--    15000  15999     96 
--    16000  16999    126 
--    17000  17999    107 
--    18000  18999     71 
--    19000  19999     56 
--    20000  20999     41 
--    21000  21999     25 
--    22000  22999     11 
--    23000  23999      5 
--    24000  24999      1

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1  17815705 *******************************************************************--> 0.7426 0.0796
--       2-     2   2102826 ********************************************************************** 0.8303 0.0984
--       3-     4   1101765 ************************************                                   0.8615 0.1084
--       5-     7    366393 ************                                                           0.8841 0.1190
--       8-    11    106475 ***                                                                    0.8933 0.1256
--      12-    16     30450 *                                                                      0.8964 0.1289
--      17-    22      8417                                                                        0.8973 0.1304
--      23-    29      4520                                                                        0.8976 0.1310
--      30-    37     17760                                                                        0.8978 0.1316
--      38-    46     74433 **                                                                     0.8987 0.1350
--      47-    56    222943 *******                                                                0.9022 0.1513
--      57-    67    501607 ****************                                                       0.9123 0.2092
--      68-    79    627801 ********************                                                   0.9340 0.3565
--      80-    92    530937 *****************                                                      0.9600 0.5637
--      93-   106    286412 *********                                                              0.9814 0.7618
--     107-   121    112303 ***                                                                    0.9925 0.8809
--     122-   137     45638 *                                                                      0.9969 0.9344
--     138-   154     17735                                                                        0.9987 0.9592
--     155-   172      6172                                                                        0.9994 0.9698
--     173-   191      3312                                                                        0.9996 0.9742
--     192-   211      1396                                                                        0.9998 0.9767
--     212-   232      1884                                                                        0.9998 0.9780
--     233-   254       175                                                                        0.9999 0.9798
--     255-   277        74                                                                        0.9999 0.9800
--     278-   301        55                                                                        0.9999 0.9801
--     302-   326        86                                                                        0.9999 0.9801
--     327-   352        57                                                                        0.9999 0.9803
--     353-   379        46                                                                        0.9999 0.9803
--     380-   407        51                                                                        0.9999 0.9804
--     408-   436        27                                                                        0.9999 0.9805
--     437-   466        25                                                                        0.9999 0.9806
--     467-   497         6                                                                        0.9999 0.9806
--     498-   529        23                                                                        0.9999 0.9806
--     530-   562        26                                                                        0.9999 0.9807
--     563-   596        27                                                                        0.9999 0.9807
--     597-   631        16                                                                        0.9999 0.9808
--     632-   667         7                                                                        0.9999 0.9808
--     668-   704         3                                                                        0.9999 0.9809
--     705-   742        12                                                                        0.9999 0.9809
--     743-   781         2                                                                        0.9999 0.9809
--     782-   821         3                                                                        0.9999 0.9809
--
--      117322 (max occurrences)
--   205944908 (total mers, non-unique)
--     6173892 (distinct mers, non-unique)
--    17815705 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0450    (use overlaps at or below this fraction error)
--        1    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        1    (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--  318231 reads    224759402 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--   31881 reads    165018864 bases (trimmed reads output)
--   10683 reads     47638374 bases (reads with no change, kept as is)
--  271926 reads       973148 bases (reads with no overlaps, deleted)
--    3741 reads      3161434 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--   21961 reads      4327605 bases (bases trimmed from the 5' end of a read)
--   21528 reads      3639977 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0450    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--   42564 reads    220624820 bases (reads processed)
--  275667 reads      4134582 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--       0 reads            0 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--   42564 reads    220624820 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--     195 reads          195 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--     195 reads        85910 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--      83 reads       656196 bases (trimmed from the 5' end of the read)
--     112 reads       704033 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In gatekeeper store './asm.gkpStore':
--   Found 42556 reads.
--   Found 211290684 bases (70.43 times coverage).
--
--   Read length histogram (one '*' equals 139.67 reads):
--        0    999      0 
--     1000   1999   9777 **********************************************************************
--     2000   2999   6371 *********************************************
--     3000   3999   5707 ****************************************
--     4000   4999   4922 ***********************************
--     5000   5999   3746 **************************
--     6000   6999   2213 ***************
--     7000   7999    880 ******
--     8000   8999    732 *****
--     9000   9999    603 ****
--    10000  10999   2799 ********************
--    11000  11999   4642 *********************************
--    12000  12999    163 *
--    13000  13999      1

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1  14160028 *******************************************************************--> 0.7124 0.0673
--       2-     2   1848956 ********************************************************************** 0.8055 0.0849
--       3-     4    970035 ************************************                                   0.8386 0.0943
--       5-     7    317568 ************                                                           0.8625 0.1041
--       8-    11     90747 ***                                                                    0.8721 0.1101
--      12-    16     25144                                                                        0.8752 0.1131
--      17-    22      7108                                                                        0.8762 0.1144
--      23-    29      7213                                                                        0.8765 0.1150
--      30-    37     31711 *                                                                      0.8769 0.1161
--      38-    46    107391 ****                                                                   0.8788 0.1224
--      47-    56    288498 **********                                                             0.8848 0.1473
--      57-    67    568259 *********************                                                  0.9005 0.2260
--      68-    79    600209 **********************                                                 0.9296 0.3995
--      80-    92    461777 *****************                                                      0.9594 0.6086
--      93-   106    233136 ********                                                               0.9816 0.7897
--     107-   121     91361 ***                                                                    0.9925 0.8924
--     122-   137     38726 *                                                                      0.9968 0.9389
--     138-   154     13993                                                                        0.9987 0.9615
--     155-   172      5705                                                                        0.9993 0.9704
--     173-   191      2660                                                                        0.9996 0.9747
--     192-   211      1530                                                                        0.9997 0.9768
--     212-   232      1437                                                                        0.9998 0.9784
--     233-   254        84                                                                        0.9999 0.9798
--     255-   277        63                                                                        0.9999 0.9799
--     278-   301        63                                                                        0.9999 0.9800
--     302-   326        89                                                                        0.9999 0.9800
--     327-   352        32                                                                        0.9999 0.9802
--     353-   379        66                                                                        0.9999 0.9802
--     380-   407        49                                                                        0.9999 0.9803
--     408-   436        31                                                                        0.9999 0.9804
--     437-   466         9                                                                        0.9999 0.9805
--     467-   497        11                                                                        0.9999 0.9805
--     498-   529        34                                                                        0.9999 0.9805
--     530-   562        11                                                                        0.9999 0.9806
--     563-   596         4                                                                        0.9999 0.9806
--     597-   631         0                                                                        0.0000 0.0000
--     632-   667         2                                                                        0.9999 0.9807
--     668-   704         6                                                                        0.9999 0.9807
--     705-   742         5                                                                        0.9999 0.9807
--     743-   781         4                                                                        0.9999 0.9807
--     782-   821         4                                                                        0.9999 0.9807
--
--       50203 (max occurrences)
--   196236980 (total mers, non-unique)
--     5715698 (distinct mers, non-unique)
--    14160028 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing         26    0.06     8271.54 +- 2579.25        422.54 +- 923.55     (bad trimming)
--   middle-hump             2    0.00     1975.50 +- 3.54             0.00 +- 0.00       (bad trimming)
--   no-5-prime              6    0.01     2845.00 +- 2344.22        947.50 +- 2033.27    (bad trimming)
--   no-3-prime              5    0.01     1432.80 +- 439.93         278.20 +- 230.95     (bad trimming)
--   
--   low-coverage           37    0.09     2139.19 +- 1134.27         10.85 +- 6.62       (easy to assemble, potential for lower quality consensus)
--   unique              30960   72.75     4767.19 +- 3321.57         76.60 +- 16.21      (easy to assemble, perfect, yay)
--   repeat-cont          2800    6.58     2042.03 +- 664.30        1020.81 +- 586.01     (potential for consensus errors, no impact on assembly)
--   repeat-dove             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat          6009   14.12     6910.93 +- 3358.38       2084.38 +- 1844.16    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont     2314    5.44     5211.05 +- 2700.87                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove      341    0.80    11057.65 +- 1266.38                             (will end contigs, potential to misassemble)
--   uniq-anchor            32    0.08     7993.94 +- 3724.38       3956.31 +- 3319.07    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      2 sequences, total length 2496655 bp (including 0 repeats of total length 0 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  10957 sequences, total length 28698963 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     2494605             1     2494605
--     20     2494605             1     2494605
--     30     2494605             1     2494605
--     40     2494605             1     2494605
--     50     2494605             1     2494605
--     60     2494605             1     2494605
--     70     2494605             1     2494605
--     80     2494605             1     2494605
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      2 sequences, total length 2481524 bp (including 0 repeats of total length 0 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  10957 sequences, total length 28698963 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     2479478             1     2479478
--     20     2479478             1     2479478
--     30     2479478             1     2479478
--     40     2479478             1     2479478
--     50     2479478             1     2479478
--     60     2479478             1     2479478
--     70     2479478             1     2479478
--     80     2479478             1     2479478
--
ml3958 commented 6 years ago

Thank you so much!! This is very helpful. You're definitely right - I used fastq-dump to download the .fastq files directly and did not do QC. I was looking for a good tool like fastQC for Illumina data for PacBio dataset, but had no luck.

Can you please share the results from data of combined cells? Many thanks.

skoren commented 6 years ago

You don't need QC or anything like a fastqc tool, just the SMRT portal or SMRT link commands to extract fastq files, see issue #34 for example. The issue is that the fastq-dump is reporting everything in the run, including noise where no real data was sequenced and reads going through the adapter. These can probably still be assembled with tuned parameters but the PacBio SMRT link software will automatically filter out the noise. I was able to get close to the expected read set with the fastq-dump command:

fastq-dump --qual-filter-1 -W --readids --read-filter pass --dumpbase -M 500 --readids --gzip --split-spot --skip-technical SRR6331514

The official pacbio advice also seems to be to download the raw files instead of fastq as well: https://github.com/pb-jlandolin/PacbioToSRA/issues/2. You'll need the h5 files and the SMRT link software anyway if you want to run Arrow consensus polishing.

asm.contigs.fasta.gz

rbartelme commented 6 years ago

I am having a similar issue with an extremely high coverage bacterial genome dataset that combines two RSII SMRT cells worth of data. I get 3 contigs at the end of the assembly. The smallest must be the PacBio reference as mentioned previously, but I find it odd I cannot close the genome. The bacteria is relatively AT rich and obviously extremely high coverage, so I adjusted the following parameters accordingly: --correctedErrorRate=0.035 --corMaxEvidenceErate=0.15

The report is as follows:

[CORRECTION/READS]
--
-- In gatekeeper store 'correction/Fc_MSFC4.gkpStore':
--   Found 237020 reads.
--   Found 2373922116 bases (719.37 times coverage).
--
--   Read length histogram (one '*' equals 254.35 reads):
--        0    999      0 
--     1000   1999  13318 ****************************************************
--     2000   2999  13164 ***************************************************
--     3000   3999  13213 ***************************************************
--     4000   4999  13640 *****************************************************
--     5000   5999  13961 ******************************************************
--     6000   6999  13981 ******************************************************
--     7000   7999  13809 ******************************************************
--     8000   8999  13626 *****************************************************
--     9000   9999  14376 ********************************************************
--    10000  10999  15979 **************************************************************
--    11000  11999  17805 **********************************************************************
--    12000  12999  16696 *****************************************************************
--    13000  13999  13374 ****************************************************
--    14000  14999  10522 *****************************************
--    15000  15999   8063 *******************************
--    16000  16999   6345 ************************
--    17000  17999   5006 *******************
--    18000  18999   3983 ***************
--    19000  19999   3150 ************
--    20000  20999   2582 **********
--    21000  21999   1996 *******
--    22000  22999   1559 ******
--    23000  23999   1284 *****
--    24000  24999   1041 ****
--    25000  25999    803 ***
--    26000  26999    718 **
--    27000  27999    541 **
--    28000  28999    426 *
--    29000  29999    355 *
--    30000  30999    246 
--    31000  31999    266 *
--    32000  32999    208 
--    33000  33999    167 
--    34000  34999    139 
--    35000  35999    105 
--    36000  36999     85 
--    37000  37999     77 
--    38000  38999     69 
--    39000  39999     61 
--    40000  40999     49 
--    41000  41999     35 
--    42000  42999     30 
--    43000  43999     25 
--    44000  44999     22 
--    45000  45999     22 
--    46000  46999     16 
--    47000  47999     12 
--    48000  48999     12 
--    49000  49999      9 
--    50000  50999      9 
--    51000  51999     10 
--    52000  52999      5 
--    53000  53999      3 
--    54000  54999      4 
--    55000  55999      5 
--    56000  56999      2 
--    57000  57999      3 
--    58000  58999      1 
--    59000  59999      0 
--    60000  60999      2 
--    61000  61999      2 
--    62000  62999      1 
--    63000  63999      1 
--    64000  64999      0 
--    65000  65999      0 
--    66000  66999      0 
--    67000  67999      0 
--    68000  68999      0 
--    69000  69999      0 
--    70000  70999      0 
--    71000  71999      0 
--    72000  72999      0 
--    73000  73999      1

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 361226113 *******************************************************************--> 0.5316 0.1524
--       2-     2 126309308 ********************************************************************** 0.7174 0.2590
--       3-     4  92795577 ***************************************************                    0.8048 0.3341
--       5-     7  46205992 *************************                                              0.8851 0.4351
--       8-    11  23168514 ************                                                           0.9337 0.5299
--      12-    16  12069732 ******                                                                 0.9610 0.6099
--      17-    22   6425902 ***                                                                    0.9760 0.6732
--      23-    29   3446795 *                                                                      0.9843 0.7210
--      30-    37   1894115 *                                                                      0.9888 0.7557
--      38-    46   1117348                                                                        0.9914 0.7806
--      47-    56    785180                                                                        0.9929 0.7995
--      57-    67    709910                                                                        0.9941 0.8163
--      68-    79    708910                                                                        0.9951 0.8349
--      80-    92    709798                                                                        0.9961 0.8571
--      93-   106    654418                                                                        0.9972 0.8829
--     107-   121    486100                                                                        0.9981 0.9101
--     122-   137    290064                                                                        0.9988 0.9328
--     138-   154    163415                                                                        0.9992 0.9480
--     155-   172    100086                                                                        0.9994 0.9577
--     173-   191     67381                                                                        0.9996 0.9645
--     192-   211     47294                                                                        0.9997 0.9695
--     212-   232     34281                                                                        0.9998 0.9735
--     233-   254     24914                                                                        0.9998 0.9766
--     255-   277     18763                                                                        0.9998 0.9792
--     278-   301     14228                                                                        0.9999 0.9812
--     302-   326     11324                                                                        0.9999 0.9830
--     327-   352      9174                                                                        0.9999 0.9845
--     353-   379      7542                                                                        0.9999 0.9858
--     380-   407      6125                                                                        0.9999 0.9869
--     408-   436      5279                                                                        0.9999 0.9879
--     437-   466      4399                                                                        0.9999 0.9889
--     467-   497      3826                                                                        1.0000 0.9897
--     498-   529      3479                                                                        1.0000 0.9905
--     530-   562      3147                                                                        1.0000 0.9912
--     563-   596      3039                                                                        1.0000 0.9919
--     597-   631      2697                                                                        1.0000 0.9927
--     632-   667      2329                                                                        1.0000 0.9934
--     668-   704      1835                                                                        1.0000 0.9940
--     705-   742      1501                                                                        1.0000 0.9945
--     743-   781      1260                                                                        1.0000 0.9950
--     782-   821      1051                                                                        1.0000 0.9954
--
--      803871 (max occurrences)
--  2009140703 (total mers, non-unique)
--   318322643 (distinct mers, non-unique)
--   361226113 (unique mers)

[CORRECTION/CORRECTIONS]
--
-- Reads to be corrected:
--   7002 reads longer than 24051 bp
--   138140871 bp
-- Expected corrected reads:
--   7002 reads
--   132000308 bp
--   15581 bp minimum length
--   18852 bp mean length
--   33768 bp n50 length

[TRIMMING/READS]
--
-- In gatekeeper store 'trimming/Fc_MSFC4.gkpStore':
--   Found 7084 reads.
--   Found 126409899 bases (38.3 times coverage).
--
--   Read length histogram (one '*' equals 24.01 reads):
--        0    999      0 
--     1000   1999     27 *
--     2000   2999     16 
--     3000   3999     13 
--     4000   4999     17 
--     5000   5999     12 
--     6000   6999     13 
--     7000   7999     11 
--     8000   8999     11 
--     9000   9999     10 
--    10000  10999     11 
--    11000  11999     26 *
--    12000  12999     29 *
--    13000  13999     45 *
--    14000  14999    212 ********
--    15000  15999   1681 **********************************************************************
--    16000  16999   1385 *********************************************************
--    17000  17999    898 *************************************
--    18000  18999    770 ********************************
--    19000  19999    495 ********************
--    20000  20999    376 ***************
--    21000  21999    265 ***********
--    22000  22999    192 *******
--    23000  23999    158 ******
--    24000  24999     97 ****
--    25000  25999     78 ***
--    26000  26999     54 **
--    27000  27999     39 *
--    28000  28999     29 *
--    29000  29999     28 *
--    30000  30999     22 
--    31000  31999     16 
--    32000  32999     11 
--    33000  33999     10 
--    34000  34999      4 
--    35000  35999      6 
--    36000  36999      4 
--    37000  37999      4 
--    38000  38999      2 
--    39000  39999      1 
--    40000  40999      0 
--    41000  41999      1 
--    42000  42999      1 
--    43000  43999      1 
--    44000  44999      2 
--    45000  45999      1

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1   3970878 *******************************************************************--> 0.5250 0.0314
--       2-     2    300550 *************************                                              0.5648 0.0362
--       3-     4    109215 *********                                                              0.5753 0.0381
--       5-     7     25959 **                                                                     0.5810 0.0396
--       8-    11     33567 **                                                                     0.5832 0.0405
--      12-    16     75113 ******                                                                 0.5886 0.0440
--      17-    22    241383 ********************                                                   0.6002 0.0546
--      23-    29    568182 ***********************************************                        0.6369 0.1006
--      30-    37    837010 ********************************************************************** 0.7172 0.2316
--      38-    46    829562 *********************************************************************  0.8267 0.4574
--      47-    56    381827 *******************************                                        0.9336 0.7311
--      57-    67    106774 ********                                                               0.9773 0.8656
--      68-    79     35793 **                                                                     0.9897 0.9110
--      80-    92      7906                                                                        0.9940 0.9299
--      93-   106      3581                                                                        0.9949 0.9344
--     107-   121      7457                                                                        0.9953 0.9373
--     122-   137      3675                                                                        0.9964 0.9446
--     138-   154      2271                                                                        0.9968 0.9477
--     155-   172      3484                                                                        0.9971 0.9505
--     173-   191      2591                                                                        0.9976 0.9550
--     192-   211      2296                                                                        0.9979 0.9588
--     212-   232      1039                                                                        0.9982 0.9621
--     233-   254      1191                                                                        0.9983 0.9640
--     255-   277      1037                                                                        0.9985 0.9663
--     278-   301      1414                                                                        0.9986 0.9685
--     302-   326      4807                                                                        0.9988 0.9717
--     327-   352      1252                                                                        0.9995 0.9843
--     353-   379      1451                                                                        0.9996 0.9873
--     380-   407       393                                                                        0.9998 0.9913
--     408-   436       240                                                                        0.9998 0.9925
--     437-   466        51                                                                        0.9999 0.9933
--     467-   497        90                                                                        0.9999 0.9935
--     498-   529        71                                                                        0.9999 0.9938
--     530-   562        54                                                                        0.9999 0.9941
--     563-   596       185                                                                        0.9999 0.9943
--     597-   631        60                                                                        0.9999 0.9952
--     632-   667        79                                                                        0.9999 0.9955
--     668-   704       103                                                                        0.9999 0.9959
--     705-   742        29                                                                        1.0000 0.9964
--     743-   781         8                                                                        1.0000 0.9966
--     782-   821         4                                                                        1.0000 0.9967
--
--       56304 (max occurrences)
--   122290257 (total mers, non-unique)
--     3592020 (distinct mers, non-unique)
--     3970878 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0350    (use overlaps at or below this fraction error)
--        1    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        1    (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--    7084 reads    126409899 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--    6550 reads    115265524 bases (trimmed reads output)
--     512 reads      8975930 bases (reads with no change, kept as is)
--      18 reads        73419 bases (reads with no overlaps, deleted)
--       4 reads         5424 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--    4966 reads      1170217 bases (bases trimmed from the 5' end of a read)
--    5436 reads       919385 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0350    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--    7062 reads    126331056 bases (reads processed)
--      22 reads        78843 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--       0 reads            0 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--    7062 reads    126331056 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--       6 reads            6 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--       6 reads         1864 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--       3 reads        25137 bases (trimmed from the 5' end of the read)
--       3 reads        23053 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In gatekeeper store 'unitigging/Fc_MSFC4.gkpStore':
--   Found 7062 reads.
--   Found 124193264 bases (37.63 times coverage).
--
--   Read length histogram (one '*' equals 23.95 reads):
--        0    999      0 
--     1000   1999     21 
--     2000   2999     15 
--     3000   3999     11 
--     4000   4999     15 
--     5000   5999     10 
--     6000   6999      9 
--     7000   7999     13 
--     8000   8999     27 *
--     9000   9999     43 *
--    10000  10999     39 *
--    11000  11999     58 **
--    12000  12999     64 **
--    13000  13999     88 ***
--    14000  14999    274 ***********
--    15000  15999   1677 **********************************************************************
--    16000  16999   1351 ********************************************************
--    17000  17999    864 ************************************
--    18000  18999    716 *****************************
--    19000  19999    469 *******************
--    20000  20999    358 **************
--    21000  21999    240 **********
--    22000  22999    172 *******
--    23000  23999    154 ******
--    24000  24999     84 ***
--    25000  25999     75 ***
--    26000  26999     47 *
--    27000  27999     36 *
--    28000  28999     26 *
--    29000  29999     31 *
--    30000  30999     16 
--    31000  31999     15 
--    32000  32999     10 
--    33000  33999      8 
--    34000  34999      4 
--    35000  35999      7 
--    36000  36999      3 
--    37000  37999      5 
--    38000  38999      1 
--    39000  39999      0 
--    40000  40999      0 
--    41000  41999      1 
--    42000  42999      1 
--    43000  43999      1 
--    44000  44999      2 
--    45000  45999      1

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1   3171948 *******************************************************************--> 0.4714 0.0256
--       2-     2    277944 ***********************                                                0.5127 0.0301
--       3-     4    101105 ********                                                               0.5236 0.0318
--       5-     7     23724 *                                                                      0.5295 0.0332
--       8-    11     34847 **                                                                     0.5319 0.0341
--      12-    16     78306 ******                                                                 0.5382 0.0378
--      17-    22    246084 ********************                                                   0.5517 0.0490
--      23-    29    589075 *************************************************                      0.5938 0.0968
--      30-    37    838615 ********************************************************************** 0.6873 0.2349
--      38-    46    828917 *********************************************************************  0.8101 0.4642
--      47-    56    361600 ******************************                                         0.9297 0.7409
--      57-    67     97194 ********                                                               0.9762 0.8701
--      68-    79     34549 **                                                                     0.9888 0.9124
--      80-    92      6602                                                                        0.9934 0.9307
--      93-   106      3871                                                                        0.9942 0.9346
--     107-   121      8360                                                                        0.9948 0.9377
--     122-   137      2452                                                                        0.9961 0.9458
--     138-   154      2575                                                                        0.9964 0.9479
--     155-   172      3545                                                                        0.9968 0.9512
--     173-   191      2505                                                                        0.9973 0.9557
--     192-   211      1959                                                                        0.9977 0.9596
--     212-   232      1243                                                                        0.9980 0.9624
--     233-   254      1124                                                                        0.9982 0.9647
--     255-   277      1196                                                                        0.9983 0.9669
--     278-   301      1106                                                                        0.9985 0.9696
--     302-   326      5084                                                                        0.9987 0.9720
--     327-   352      1181                                                                        0.9994 0.9853
--     353-   379      1203                                                                        0.9996 0.9884
--     380-   407       401                                                                        0.9998 0.9917
--     408-   436       199                                                                        0.9998 0.9931
--     437-   466        61                                                                        0.9999 0.9937
--     467-   497       103                                                                        0.9999 0.9939
--     498-   529        34                                                                        0.9999 0.9943
--     530-   562       223                                                                        0.9999 0.9944
--     563-   596         3                                                                        0.9999 0.9954
--     597-   631        66                                                                        0.9999 0.9954
--     632-   667       140                                                                        0.9999 0.9958
--     668-   704        62                                                                        0.9999 0.9965
--     705-   742         3                                                                        1.0000 0.9968
--     743-   781         9                                                                        1.0000 0.9969
--     782-   821         3                                                                        1.0000 0.9969
--
--       29139 (max occurrences)
--   120873014 (total mers, non-unique)
--     3557539 (distinct mers, non-unique)
--     3171948 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing          4    0.06    10693.75 +- 10181.02      1078.50 +- 1086.73    (bad trimming)
--   middle-hump             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (bad trimming)
--   no-5-prime              1    0.01     5086.00 +- 0.00          1682.00 +- 0.00       (bad trimming)
--   no-3-prime              0    0.00        0.00 +- 0.00             0.00 +- 0.00       (bad trimming)
--   
--   low-coverage            3    0.04     2963.33 +- 1357.28          3.31 +- 1.93       (easy to assemble, potential for lower quality consensus)
--   unique               4581   64.87    17538.28 +- 3851.70         34.01 +- 7.79       (easy to assemble, perfect, yay)
--   repeat-cont           138    1.95    15523.88 +- 3433.92         67.99 +- 9.62       (potential for consensus errors, no impact on assembly)
--   repeat-dove             8    0.11    21575.62 +- 940.59          67.49 +- 7.30       (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat           898   12.72    17989.95 +- 3737.95       5671.21 +- 5685.10    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont     1166   16.51    16898.89 +- 2729.78                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove      163    2.31    23052.36 +- 4658.13                             (will end contigs, potential to misassemble)
--   uniq-anchor           100    1.42    18621.54 +- 4340.77       7597.53 +- 4925.88    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      3 sequences, total length 3455553 bp (including 1 repeats of total length 24019 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  335 sequences, total length 5014160 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     2712622             1     2712622
--     20     2712622             1     2712622
--     30     2712622             1     2712622
--     40     2712622             1     2712622
--     50     2712622             1     2712622
--     60     2712622             1     2712622
--     70     2712622             1     2712622
--     80     2712622             1     2712622
--     90      718912             2     3431534
--    100      718912             2     3431534
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      3 sequences, total length 3449272 bp (including 1 repeats of total length 23378 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  335 sequences, total length 5014154 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     2709159             1     2709159
--     20     2709159             1     2709159
--     30     2709159             1     2709159
--     40     2709159             1     2709159
--     50     2709159             1     2709159
--     60     2709159             1     2709159
--     70     2709159             1     2709159
--     80     2709159             1     2709159
--     90      716735             2     3425894
--    100      716735             2     3425894
--
rbartelme commented 6 years ago

Any advice as to parameters to change in the assembly?

skoren commented 6 years ago

The previous issue was due to poor input data (the SRA dump command was outputting data not output by the default processing from the sequencer). No parameter changes were necessary.

The assembly depends on repeats present in the genome. From your assembly output, it seems there is a large (>22kbp) repeat that is most likely not spanned. The PacBio control sequence is only 2kb, your shortest contig, which is a repeat, is >20kb. You have few reads longer than this so unless the repeat is diverged enough it won't be resolved no matter how much sequencing coverage you add unless you get longer reads. The GFA output should give more information on how the ambiguity from the repeat. You could try reducing the error rate used for unitigging to see if you could resolve the repeat (see the heterezygous genome parameters on the wiki) and if you share your data we could try it locally but it is possible the repeat is too large to be resolved by your data.

ml3958 commented 6 years ago

Just for curiosity -- how did you find out that there exists a large repeat and the length of it? I do see some portion of span and unique repeat but not sure exactly how to interpret it.

--   span-repeat           898   12.72    17989.95 +- 3737.95       5671.21 +- 5685.10    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont     1166   16.51    16898.89 +- 2729.78                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)

This is also a general problem when I use Canu -- not able to fully interpret the report; or not sure which parts of the report should I primly look (I usually focus on the X times coverage after each step and [UNITIGGING/OVERLAPS] stats). What do you recommend to look at in the report as a general check for data/assembly evaluation and is there good reference papers?

Sorry for interrupting this thread, I am new to assembly. Any information is appreciated.

skoren commented 6 years ago

The report is primarily so we can diagnose issues with users assemblies. Typically, check that the corrected/trimmed coverage is sufficient for assembly (e.g. 30x or higher) and that there is a peak in the k-mer counts post-correction at close to that coverage. For the repeat, the repeat stats:

--   repeat-cont           138    1.95    15523.88 +- 3433.92         67.99 +- 9.62       (potential for consensus errors, no impact on assembly)

are reads which are contained in a repeat. They are 15kb+. There is also:

--   uniq-repeat-cont     1166   16.51    16898.89 +- 2729.78                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)

which are reads where one end is unique and one end is repetitive (e.g. a junction read) but also contained in another longer sequence. There is also the unitigging output:

--   contigs:      3 sequences, total length 3455553 bp (including 1 repeats of total length 24019 bp).

so there is a repeat contig (by coverage) of >20kb and the overlap stats above indicate the presence of a large repeat which is why I would guess this genome has a repeat >22kbp.

rbartelme commented 6 years ago

I just BLAST the 22k contig, and it comes back matching another strain of the same organism. It is very likely there are repetitive segments of the genome, since this particular genus of bacterium is known for secreting a lot of proteins and often has repeated secretory system elements throughout the genome. Additionally, these organisms often contain multiple rRNA copies.

rbartelme commented 6 years ago

Is there an empirical way to determine how to adjust the utgErrorRate for this particular dataset?

ml3958 commented 6 years ago

@skoren This is really helpful. Thank you!