marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

RSII parameters for bacterial genomes #1566

Closed chadsmith123 closed 4 years ago

chadsmith123 commented 4 years ago

I have RS II data and was wondering if there are parameter recommendations for bacterial genomes? I see recs for Sequel V2 on the FAQ but not RSII.

I ask because I ran Canu 1.9 with the default parameters on a 1.8MB genome w/956x coverage:

canu -p h11 -d canu1.9 genomeSize=1.8m -pacbio-raw ~/pacbio/h11/m170721_234925_42146_c101206462550000001823287110171766_s1_p0.*subreads.fastq

A single contig results but after annotation many genes are split into multiple segments, e.g. gene1_a, gene1_b, gene1_c will overlap in a region of the genome, three 500bp each rather than one 1500bp gene1 as is found in their relatives. I assume this is due to indels etc resulting from the assembly process.

Attached is the log. Thanks!

[CORRECTION/READS]
--
-- In sequence store './h11.seqStore':
--   Found 200332 reads.
--   Found 1721337725 bases (956.29 times coverage).
--
--   Read length histogram (one '*' equals 313.35 reads):
--     1000   1999  12663 ****************************************
--     2000   2999  13122 *****************************************
--     3000   3999  13275 ******************************************
--     4000   4999  13249 ******************************************
--     5000   5999  13089 *****************************************
--     6000   6999  13884 ********************************************
--     7000   7999  18421 **********************************************************
--     8000   8999  21935 **********************************************************************
--     9000   9999  19266 *************************************************************
--    10000  10999  14690 **********************************************
--    11000  11999  10725 **********************************
--    12000  12999   7642 ************************
--    13000  13999   5757 ******************
--    14000  14999   4393 **************
--    15000  15999   3202 **********
--    16000  16999   2571 ********
--    17000  17999   2029 ******
--    18000  18999   1683 *****
--    19000  19999   1356 ****
--    20000  20999   1095 ***
--    21000  21999    932 **
--    22000  22999    761 **
--    23000  23999    699 **
--    24000  24999    558 *
--    25000  25999    449 *
--    26000  26999    442 *
--    27000  27999    332 *
--    28000  28999    316 *
--    29000  29999    255 
--    30000  30999    208 
--    31000  31999    185 
--    32000  32999    136 
--    33000  33999    153 
--    34000  34999    106 
--    35000  35999    112 
--    36000  36999    100 
--    37000  37999     75 
--    38000  38999     66 
--    39000  39999     76 
--    40000  40999     37 
--    41000  41999     37 
--    42000  42999     28 
--    43000  43999     31 
--    44000  44999     26 
--    45000  45999     24 
--    46000  46999     24 
--    47000  47999     16 
--    48000  48999     17 
--    49000  49999     13 
--    50000  50999     11 
--    51000  51999      6 
--    52000  52999     13 
--    53000  53999      6 
--    54000  54999      6 
--    55000  55999      5 
--    56000  56999      4 
--    57000  57999      7 
--    58000  58999      0 
--    59000  59999      4 
--    60000  60999      2 
--    61000  61999      2 
--    62000  62999      1 
--    63000  63999      1 
--    64000  64999      1 
--    65000  65999      1 
--    66000  66999      0 
--    67000  67999      0 
--    68000  68999      0 
--    69000  69999      0 
--    70000  70999      0 
--    71000  71999      0 
--    72000  72999      1

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2 199847435 ********************************************************************** 0.5816 0.3492
--       3-     4 101850123 ***********************************                                    0.7910 0.5379
--       5-     7  27255974 *********                                                              0.9201 0.7056
--       8-    11   7981310 **                                                                     0.9664 0.7987
--      12-    16   2986490 *                                                                      0.9831 0.8496
--      17-    22   1217263                                                                        0.9901 0.8802
--      23-    29    540222                                                                        0.9930 0.8980
--      30-    37    415431                                                                        0.9945 0.9093
--      38-    46    491275                                                                        0.9957 0.9218
--      47-    56    477523                                                                        0.9971 0.9403
--      57-    67    321983                                                                        0.9984 0.9614
--      68-    79    145127                                                                        0.9993 0.9778
--      80-    92     51023                                                                        0.9997 0.9864
--      93-   106     18244                                                                        0.9998 0.9899
--     107-   121      7982                                                                        0.9999 0.9913
--     122-   137      4523                                                                        0.9999 0.9921
--     138-   154      3225                                                                        0.9999 0.9926
--     155-   172      2735                                                                        0.9999 0.9930
--     173-   191      2342                                                                        0.9999 0.9934
--     192-   211      2012                                                                        1.0000 0.9937
--     212-   232      1961                                                                        1.0000 0.9941
--     233-   254      1719                                                                        1.0000 0.9945
--     255-   277      1470                                                                        1.0000 0.9949
--     278-   301      1099                                                                        1.0000 0.9952
--     302-   326       804                                                                        1.0000 0.9955
--     327-   352       802                                                                        1.0000 0.9957
--     353-   379       714                                                                        1.0000 0.9959
--     380-   407       804                                                                        1.0000 0.9961
--     408-   436       684                                                                        1.0000 0.9964
--     437-   466       622                                                                        1.0000 0.9967
--     467-   497       486                                                                        1.0000 0.9969
--     498-   529       458                                                                        1.0000 0.9971
--     530-   562       356                                                                        1.0000 0.9973
--     563-   596       253                                                                        1.0000 0.9975
--     597-   631       212                                                                        1.0000 0.9976
--     632-   667       208                                                                        1.0000 0.9977
--     668-   704       199                                                                        1.0000 0.9979
--     705-   742       222                                                                        1.0000 0.9980
--     743-   781       224                                                                        1.0000 0.9981
--     782-   821       229                                                                        1.0000 0.9983
--
--           0 (max occurrences)
--  1144441503 (total mers, non-unique)
--   343636682 (distinct mers, non-unique)
--           0 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads             106066         94266
--   Number of Bases          843263222     104742682
--   Coverage                   468.480        58.190
--   Median                        8001             0
--   Mean                          7950          1111
--   N50                           9461         10166
--   Minimum                       1000             0
--   Maximum                      57281         50663
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads              70422           6124          6124           3903          3903
--   Number of Bases          588781355       77187731      72007991       28180282      11114669
--   Coverage                   327.101         42.882        40.004         15.656         6.175
--   Median                        8334          11701         11195           7192          2222
--   Mean                          8360          12604         11758           7220          2847
--   N50                           9530          12068         11375           9399          3533
--   Minimum                       1000          10034         10033           1026          1001
--   Maximum                      50663          48140         30740          34634         10013
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads             190305        190305
--   Number of Bases          842637891     252258600
--   Coverage                   468.132       140.144
--   Median                        2872             0
--   Mean                          4427          1325
--   N50                           9209          7454
--   Minimum                          0             0
--   Maximum                      57281         31440
--   
--   Maximum Memory          1038916790

[TRIMMING/READS]
--
-- In sequence store './h11.seqStore':
--   Found 9467 reads.
--   Found 77875098 bases (43.26 times coverage).
--
--   Read length histogram (one '*' equals 32.21 reads):
--        0    999    336 **********
--     1000   1999   1303 ****************************************
--     2000   2999    712 **********************
--     3000   3999    415 ************
--     4000   4999    280 ********
--     5000   5999    169 *****
--     6000   6999    144 ****
--     7000   7999    138 ****
--     8000   8999    107 ***
--     9000   9999    529 ****************
--    10000  10999   2255 **********************************************************************
--    11000  11999   1450 *********************************************
--    12000  12999    777 ************************
--    13000  13999    395 ************
--    14000  14999    226 *******
--    15000  15999    104 ***
--    16000  16999     51 *
--    17000  17999     37 *
--    18000  18999     15 
--    19000  19999     10 
--    20000  20999      8 
--    21000  21999      1 
--    22000  22999      0 
--    23000  23999      1 
--    24000  24999      2 
--    25000  25999      2

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2   1252980 ********************************************************************** 0.2941 0.0366
--       3-     4    763916 ******************************************                             0.4124 0.0587
--       5-     7    334860 ******************                                                     0.5105 0.0855
--       8-    11    157444 ********                                                               0.5648 0.1084
--      12-    16    131774 *******                                                                0.5952 0.1279
--      17-    22    225906 ************                                                           0.6269 0.1577
--      23-    29    382428 *********************                                                  0.6844 0.2319
--      30-    37    457272 *************************                                              0.7769 0.3881
--      38-    46    335109 ******************                                                     0.8816 0.6118
--      47-    56    155022 ********                                                               0.9545 0.8048
--      57-    67     40934 **                                                                     0.9868 0.9084
--      68-    79      6838                                                                        0.9951 0.9399
--      80-    92      1029                                                                        0.9965 0.9462
--      93-   106       691                                                                        0.9967 0.9473
--     107-   121       646                                                                        0.9968 0.9483
--     122-   137       750                                                                        0.9970 0.9494
--     138-   154      1267                                                                        0.9972 0.9508
--     155-   172      1897                                                                        0.9975 0.9537
--     173-   191      2463                                                                        0.9979 0.9583
--     192-   211      1025                                                                        0.9985 0.9647
--     212-   232       685                                                                        0.9987 0.9676
--     233-   254       763                                                                        0.9989 0.9699
--     255-   277       275                                                                        0.9991 0.9725
--     278-   301       247                                                                        0.9991 0.9736
--     302-   326       401                                                                        0.9992 0.9746
--     327-   352       375                                                                        0.9993 0.9765
--     353-   379       536                                                                        0.9994 0.9784
--     380-   407       287                                                                        0.9995 0.9812
--     408-   436        96                                                                        0.9996 0.9829
--     437-   466       164                                                                        0.9996 0.9834
--     467-   497       205                                                                        0.9996 0.9845
--     498-   529       202                                                                        0.9997 0.9860
--     530-   562       238                                                                        0.9997 0.9875
--     563-   596       277                                                                        0.9998 0.9894
--     597-   631       303                                                                        0.9998 0.9918
--     632-   667       244                                                                        0.9999 0.9945
--     668-   704        60                                                                        1.0000 0.9968
--     705-   742         0                                                                        0.0000 0.0000
--     743-   781        20                                                                        1.0000 0.9973
--     782-   821        15                                                                        1.0000 0.9975
--
--           0 (max occurrences)
--    68403431 (total mers, non-unique)
--     4259709 (distinct mers, non-unique)
--           0 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0450    (use overlaps at or below this fraction error)
--      500    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        2    (break region if overlap coverage is less than this many reads, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--  200332 reads     77875098 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--    7915 reads     69422574 bases (trimmed reads output)
--     285 reads      2418528 bases (reads with no change, kept as is)
--  191342 reads       604793 bases (reads with no overlaps, deleted)
--     790 reads      1384193 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--    6686 reads      1989717 bases (bases trimmed from the 5' end of a read)
--    7039 reads      2055293 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0450    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--    8200 reads     75886112 bases (reads processed)
--  192132 reads      1988986 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--       0 reads            0 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--    8200 reads     75886112 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--       4 reads            4 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--       4 reads         1652 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--       1 reads         5862 bases (trimmed from the 5' end of the read)
--       3 reads        23303 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In sequence store './h11.seqStore':
--   Found 8200 reads.
--   Found 71811937 bases (39.89 times coverage).
--
--   Read length histogram (one '*' equals 30.2 reads):
--     1000   1999   1101 ************************************
--     2000   2999    432 **************
--     3000   3999    250 ********
--     4000   4999    186 ******
--     5000   5999    148 ****
--     6000   6999    165 *****
--     7000   7999    174 *****
--     8000   8999    216 *******
--     9000   9999    841 ***************************
--    10000  10999   2114 **********************************************************************
--    11000  11999   1241 *****************************************
--    12000  12999    644 *********************
--    13000  13999    335 ***********
--    14000  14999    189 ******
--    15000  15999     83 **
--    16000  16999     36 *
--    17000  17999     21 
--    18000  18999      9 
--    19000  19999      7 
--    20000  20999      6 
--    21000  21999      1 
--    22000  22999      0 
--    23000  23999      0 
--    24000  24999      1

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2   1051975 ********************************************************************** 0.2714 0.0325
--       3-     4    651859 *******************************************                            0.3814 0.0522
--       5-     7    295067 *******************                                                    0.4753 0.0768
--       8-    11    147658 *********                                                              0.5283 0.0983
--      12-    16    142183 *********                                                              0.5609 0.1186
--      17-    22    251843 ****************                                                       0.5993 0.1533
--      23-    29    400139 **************************                                             0.6687 0.2392
--      30-    37    447804 *****************************                                          0.7746 0.4108
--      38-    46    299959 *******************                                                    0.8859 0.6387
--      47-    56    134072 ********                                                               0.9571 0.8197
--      57-    67     33556 **                                                                     0.9877 0.9137
--      68-    79      5052                                                                        0.9951 0.9408
--      80-    92       765                                                                        0.9962 0.9455
--      93-   106       710                                                                        0.9964 0.9465
--     107-   121       629                                                                        0.9966 0.9476
--     122-   137       737                                                                        0.9967 0.9487
--     138-   154      1459                                                                        0.9969 0.9502
--     155-   172      2037                                                                        0.9973 0.9537
--     173-   191      2460                                                                        0.9978 0.9590
--     192-   211       832                                                                        0.9985 0.9657
--     212-   232       778                                                                        0.9987 0.9681
--     233-   254       604                                                                        0.9989 0.9708
--     255-   277       177                                                                        0.9990 0.9729
--     278-   301       270                                                                        0.9991 0.9737
--     302-   326       449                                                                        0.9991 0.9749
--     327-   352       423                                                                        0.9992 0.9771
--     353-   379       502                                                                        0.9994 0.9794
--     380-   407       195                                                                        0.9995 0.9821
--     408-   436       109                                                                        0.9995 0.9832
--     437-   466       180                                                                        0.9996 0.9840
--     467-   497       209                                                                        0.9996 0.9853
--     498-   529       225                                                                        0.9997 0.9868
--     530-   562       190                                                                        0.9997 0.9887
--     563-   596       304                                                                        0.9998 0.9902
--     597-   631       350                                                                        0.9998 0.9930
--     632-   667       155                                                                        0.9999 0.9963
--     668-   704        13                                                                        1.0000 0.9978
--     705-   742         5                                                                        1.0000 0.9979
--     743-   781        14                                                                        1.0000 0.9980
--     782-   821        10                                                                        1.0000 0.9981
--
--           0 (max occurrences)
--    64799066 (total mers, non-unique)
--     3876007 (distinct mers, non-unique)
--           0 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing          6    0.07     8916.33 +- 3115.96       3330.00 +- 2730.10    (bad trimming)
--   middle-hump             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (bad trimming)
--   no-5-prime              2    0.02     2055.50 +- 577.71         632.50 +- 894.49     (bad trimming)
--   no-3-prime              2    0.02     4013.00 +- 2507.40       1375.50 +- 1406.44    (bad trimming)
--   
--   low-coverage          267    3.26     2118.90 +- 1317.93          8.40 +- 2.43       (easy to assemble, potential for lower quality consensus)
--   unique               6698   81.68     8857.83 +- 4021.46         38.30 +- 10.22      (easy to assemble, perfect, yay)
--   repeat-cont            20    0.24     2616.35 +- 2842.25         93.87 +- 32.09      (potential for consensus errors, no impact on assembly)
--   repeat-dove             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat           614    7.49    10686.25 +- 3054.44       3264.26 +- 2991.87    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont      506    6.17     8274.66 +- 3742.40                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove       78    0.95    12625.74 +- 2104.32                             (will end contigs, potential to misassemble)
--   uniq-anchor             6    0.07    10714.67 +- 2649.33       5555.17 +- 1826.06    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      1 sequences, total length 1777077 bp (including 0 repeats of total length 0 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  101 sequences, total length 454368 bp.
--
-- Contig sizes based on genome size 1.8mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     1777077             1     1777077
--     20     1777077             1     1777077
--     30     1777077             1     1777077
--     40     1777077             1     1777077
--     50     1777077             1     1777077
--     60     1777077             1     1777077
--     70     1777077             1     1777077
--     80     1777077             1     1777077
--     90     1777077             1     1777077
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      1 sequences, total length 1790289 bp (including 0 repeats of total length 0 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  101 sequences, total length 454368 bp.
--
-- Contig sizes based on genome size 1.8mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     1790289             1     1790289
--     20     1790289             1     1790289
--     30     1790289             1     1790289
--     40     1790289             1     1790289
--     50     1790289             1     1790289
--     60     1790289             1     1790289
--     70     1790289             1     1790289
--     80     1790289             1     1790289
--     90     1790289             1     1790289
--
skoren commented 4 years ago

The default parameters work fine, the indels should be corrected by polishing the final assembly with Arrow (or Quiver depending on how old your RSII data is).