marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

Getting a much improved assembly #1058

Closed harsh-shukla closed 6 years ago

harsh-shukla commented 6 years ago

Hi, First off all thank you for the lovely tool. I am trying to assemble a mammalian genome (~2.4g haploid genome size ). I believe the individual (and the species) we are assembling is not very heterozygous ( < 0.5%) .

We generated around ~33X of Pacbio Sequel data (N50:10.3Kb) and I did a run with more or less default parameters. Since it was sequel data the only things I changed was (as discussed in FAQ) corMhapSensitivity=normal correctedErrorRate=0.085 (Since my depth was little low I increased it a little)

After running the error correction stage half of my coverage around ~15X was gone; I was only left with 15.6 X of coverage and after trimming only 15.31 X remained. Even though I knew that this would give me a fragmented assembly and may not even cover the entire genome I decided to go ahead with it.

As expected the NG50 was quite low. ~ 215kb but we got the expected assembly size (~2.3g)

[UNITIGGING/CONSENSUS]
Found, in version 2, after consensus generation:
contigs:      20214 sequences, total length 2308455763 bp (including 248 repeats of total length 3549312 bp).
bubbles:      0 sequences, total length 0 bp.
unassembled:  987420 sequences, total length 4356892888 bp.

 Contig sizes based on genome size --
          NG (bp)  LG (contigs)    sum (bp)
         ----------  ------------  ----------
     10      609186           295   240441813
     20      443198           763   480148192
     30      347562          1378   720342095
     40      276655          2154   960068665
     50      216708          3132  1200152206
     60      170121          4380  1440149466
     70      125879          6020  1680067723
     80       85401          8327  1920081328
     90       43007         12201  2160000770

Running BUSCO on it confirmed my hunch that the assembly is fragmented and does not cover the entire genome. ~13% BUSCOs are fragmented and ~10 % are missing

BUSCO version is: 3.0.2 
The lineage dataset is: mammalia_odb9 (Creation date: 2016-02-13, number of species: 50, number of BUSCOs: 4104)

BUSCO was run in mode: genome

    C:75.9%[S:75.6%,D:0.3%],F:13.6%,M:10.5%,n:4104

    3117    Complete BUSCOs (C)
    3104    Complete and single-copy BUSCOs (S)
    13  Complete and duplicated BUSCOs (D)
    560 Fragmented BUSCOs (F)
    427 Missing BUSCOs (M)
    4104    Total BUSCO groups searched

So What could I do to make my assembly a little better ? Since my initial coverage was slightly > 30X corMinCoverage=4 (choosen by default) . Should I change it to corMinCoverage=0 and run it again? Is there anything else I can change ?

Also in reference to an earlier post #848 when I look at the unitigging/4-/001*thr000.num000.log the error rate seems to be quite high I cant seem to wrap my head around that part. Should I also increase the correctedErrorRate from 0.085 to 0.105?

001thr000.num000.log file

INITIAL EDGES
-------- ----------------------------------------
 4632507 reads are contained
 6251026 reads have no best edges (singleton)
   48856 reads have only one best edge (spur) 
            40141 are mutual best
  557816 reads have two best edges 
             4795 have one mutual best edge
           550326 have two mutual best edges

ERROR RATES (7799977 samples)
-----------
mean   0.03118934 stddev 0.01249177 -> 0.10613994 fraction error =  10.613994% error
median 0.02960000 mad    0.00750000 -> 0.09631700 fraction error =   9.631700% error

EDGE FILTERING
-------- ------------------------------------------
 6254276 reads have a suspicious overlap pattern
       0 reads had edges filtered
                0 had one
                0 had two
    6609 reads have length incompatible edges
             5974 have one
              635 have two

FINAL EDGES
-------- ----------------------------------------
 4632507 reads are contained
 6264420 reads have no best edges (singleton)
   72828 reads have only one best edge (spur) 
            32241 are mutual best
  520450 reads have two best edges 
             1655 have one mutual best edge
           514987 have two mutual best edges

Also I attaching my genome scope output from untigging step and the final report generated from canu.

GenomeScope version 1.0
k = 22

property                      min               max               
Heterozygosity                2.42257%          2.78733%          
Genome Haploid Length         1,657,078,947 bp  1,689,966,374 bp  
Genome Repeat Length          184,720,225 bp    188,386,298 bp    
Genome Unique Length          1,472,358,722 bp  1,501,580,076 bp  
Model Fit                     89.0119%          89.9231%          
Read Error Rate               1.31823%          1.31823%  

Canu Final Report

[CORRECTION/READS]
--
-- In gatekeeper store './Mammalian.gkpStore':
--   Found 11490205 reads.
--   Found 78182336134 bases (32.57 times coverage).
--
--   Read length histogram (one '*' equals 27920.97 reads):
--        0    999      0 
--     1000   1999 1954468 **********************************************************************
--     2000   2999 1742867 **************************************************************
--     3000   3999 1124831 ****************************************
--     4000   4999 943905 *********************************
--     5000   5999 796367 ****************************
--     6000   6999 683089 ************************
--     7000   7999 592761 *********************
--     8000   8999 514032 ******************
--     9000   9999 447533 ****************
--    10000  10999 390795 *************
--    11000  11999 342092 ************
--    12000  12999 304283 **********
--    13000  13999 280427 **********
--    14000  14999 260830 *********
--    15000  15999 222907 *******
--    16000  16999 180278 ******
--    17000  17999 142807 *****
--    18000  18999 113807 ****
--    19000  19999  90561 ***
--    20000  20999  72916 **
--    21000  21999  57992 **
--    22000  22999  45916 *
--    23000  23999  36962 *
--    24000  24999  29471 *
--    25000  25999  23644 
--    26000  26999  19142 
--    27000  27999  15273 
--    28000  28999  12253 
--    29000  29999   9603 
--    30000  30999   8006 
--    31000  31999   6266 
--    32000  32999   4961 
--    33000  33999   4009 
--    34000  34999   3172 
--    35000  35999   2457 
--    36000  36999   2093 
--    37000  37999   1609 
--    38000  38999   1151 
--    39000  39999   1008 
--    40000  40999    748 
--    41000  41999    546 
--    42000  42999    491 
--    43000  43999    375 
--    44000  44999    303 
--    45000  45999    237 
--    46000  46999    204 
--    47000  47999    153 
--    48000  48999    119 
--    49000  49999     90 
--    50000  50999     76 
--    51000  51999     65 
--    52000  52999     36 
--    53000  53999     47 
--    54000  54999     39 
--    55000  55999     21 
--    56000  56999     16 
--    57000  57999     17 
--    58000  58999     15 
--    59000  59999      9 
--    60000  60999      9 
--    61000  61999      7 
--    62000  62999      5 
--    63000  63999      4 
--    64000  64999      6 
--    65000  65999      0 
--    66000  66999      2 
--    67000  67999      2 
--    68000  68999      0 
--    69000  69999      4 
--    70000  70999      2 
--    71000  71999      1 
--    72000  72999      3 
--    73000  73999      1 
--    74000  74999      1 
--    75000  75999      0 
--    76000  76999      3 
--    77000  77999      0 
--    78000  78999      1 
--    79000  79999      0 
--    80000  80999      1 
--    81000  81999      0 
--    82000  82999      2

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1  18717930 ****                                                                   0.0087 0.0002
--       2-     2  35006604 ********                                                               0.0251 0.0011
--       3-     4 110744625 ***************************                                            0.0484 0.0031
--       5-     7 211775545 ****************************************************                   0.1084 0.0105
--       8-    11 279430070 ********************************************************************   0.2095 0.0300
--      12-    16 284235143 ********************************************************************** 0.3358 0.0662
--      17-    22 242137163 ***********************************************************            0.4608 0.1172
--      23-    29 199663035 *************************************************                      0.5672 0.1766
--      30-    37 169596963 *****************************************                              0.6563 0.2423
--      38-    46 142408818 ***********************************                                    0.7327 0.3143
--      47-    56 114018159 ****************************                                           0.7970 0.3897
--      57-    67  86731089 *********************                                                  0.8484 0.4634
--      68-    79  63641811 ***************                                                        0.8875 0.5307
--      80-    92  45944253 ***********                                                            0.9162 0.5891
--      93-   106  33110321 ********                                                               0.9370 0.6385
--     107-   121  24008875 *****                                                                  0.9520 0.6798
--     122-   137  17574997 ****                                                                   0.9630 0.7142
--     138-   154  13027826 ***                                                                    0.9710 0.7428
--     155-   172   9791825 **                                                                     0.9769 0.7668
--     173-   191   7471723 *                                                                      0.9814 0.7870
--     192-   211   5788827 *                                                                      0.9849 0.8042
--     212-   232   4535908 *                                                                      0.9875 0.8189
--     233-   254   3589193                                                                        0.9896 0.8317
--     255-   277   2869251                                                                        0.9913 0.8428
--     278-   301   2307350                                                                        0.9926 0.8525
--     302-   326   1870718                                                                        0.9937 0.8610
--     327-   352   1531909                                                                        0.9945 0.8684
--     353-   379   1267164                                                                        0.9952 0.8750
--     380-   407   1059297                                                                        0.9958 0.8810
--     408-   436    895836                                                                        0.9963 0.8863
--     437-   466    764283                                                                        0.9967 0.8911
--     467-   497    657849                                                                        0.9971 0.8955
--     498-   529    570953                                                                        0.9974 0.8995
--     530-   562    499040                                                                        0.9976 0.9033
--     563-   596    438403                                                                        0.9979 0.9068
--     597-   631    385738                                                                        0.9981 0.9100
--     632-   667    340814                                                                        0.9983 0.9130
--     668-   704    300442                                                                        0.9984 0.9158
--     705-   742    264660                                                                        0.9986 0.9185
--     743-   781    233762                                                                        0.9987 0.9209
--     782-   821    207489                                                                        0.9988 0.9232
--
--    41247321 (max occurrences)
-- 77991265129 (total mers, non-unique)
--  2123085280 (distinct mers, non-unique)
--    18717930 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads           11165384        324821
--   Number of Bases        74641116180    2178889482
--   Coverage                    31.100         0.908
--   Median                        4820          6462
--   Mean                          6685          6707
--   N50                          10330         11529
--   Minimum                       1000             0
--   Maximum                      82685         61710
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads           11351051        8976404       8976404              0             0
--   Number of Bases        76592898941    62607818394   43909855773              0             0
--   Coverage                    31.914         26.087        18.296          0.000         0.000
--   Median                        4912           5093          2782              0             0
--   Mean                          6747           6974          4891              0             0
--   N50                          10376          10797         10433              0             0
--   Minimum                       1000           1000             1              0             0
--   Maximum                      82685          82685         82041              0             0
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads            2513801       2513801
--   Number of Bases        14212187268             0
--   Coverage                     5.922         0.000
--   Median                        4108             0
--   Mean                          5653             0
--   N50                           8763             0
--   Minimum                          0             0
--   Maximum                      78618             0
--   
--   Maximum Memory          1386018818

[TRIMMING/READS]
--
-- In gatekeeper store './Mammalian.gkpStore':
--   Found 5492024 reads.
--   Found 37458078739 bases (15.6 times coverage).
--
--   Read length histogram (one '*' equals 12648.17 reads):
--        0    999 217101 *****************
--     1000   1999 885372 **********************************************************************
--     2000   2999 501891 ***************************************
--     3000   3999 486774 **************************************
--     4000   4999 449578 ***********************************
--     5000   5999 406280 ********************************
--     6000   6999 363942 ****************************
--     7000   7999 323379 *************************
--     8000   8999 284732 **********************
--     9000   9999 248655 *******************
--    10000  10999 216541 *****************
--    11000  11999 191429 ***************
--    12000  12999 170980 *************
--    13000  13999 158528 ************
--    14000  14999 138101 **********
--    15000  15999 109891 ********
--    16000  16999  82925 ******
--    17000  17999  62028 ****
--    18000  18999  46656 ***
--    19000  19999  35890 **
--    20000  20999  27294 **
--    21000  21999  20624 *
--    22000  22999  15565 *
--    23000  23999  11792 
--    24000  24999   8989 
--    25000  25999   6791 
--    26000  26999   5201 
--    27000  27999   3721 
--    28000  28999   2953 
--    29000  29999   2281 
--    30000  30999   1630 
--    31000  31999   1165 
--    32000  32999    906 
--    33000  33999    669 
--    34000  34999    488 
--    35000  35999    381 
--    36000  36999    257 
--    37000  37999    166 
--    38000  38999    116 
--    39000  39999    119 
--    40000  40999     60 
--    41000  41999     52 
--    42000  42999     41 
--    43000  43999     32 
--    44000  44999     20 
--    45000  45999      9 
--    46000  46999      3 
--    47000  47999      4 
--    48000  48999     10 
--    49000  49999      5 
--    50000  50999      2 
--    51000  51999      2 
--    52000  52999      1 
--    53000  53999      1 
--    54000  54999      0 
--    55000  55999      1

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 6048307719 *******************************************************************--> 0.6464 0.1620
--       2-     2 865661883 ********************************************************************** 0.7389 0.2083
--       3-     4 577466933 **********************************************                         0.7771 0.2371
--       5-     7 458958088 *************************************                                  0.8187 0.2833
--       8-    11 514318325 *****************************************                              0.8641 0.3625
--      12-    16 484542452 ***************************************                                0.9168 0.5006
--      17-    22 278746479 **********************                                                 0.9636 0.6739
--      23-    29  86959878 *******                                                                0.9886 0.7986
--      30-    37  19359021 *                                                                      0.9959 0.8467
--      38-    46   6977815                                                                        0.9976 0.8612
--      47-    56   3875140                                                                        0.9983 0.8685
--      57-    67   2462108                                                                        0.9987 0.8736
--      68-    79   1703927                                                                        0.9990 0.8775
--      80-    92   1278071                                                                        0.9991 0.8808
--      93-   106    990406                                                                        0.9993 0.8837
--     107-   121    776149                                                                        0.9994 0.8863
--     122-   137    620928                                                                        0.9995 0.8886
--     138-   154    502554                                                                        0.9995 0.8907
--     155-   172    416246                                                                        0.9996 0.8927
--     173-   191    355840                                                                        0.9996 0.8945
--     192-   211    307361                                                                        0.9997 0.8962
--     212-   232    262227                                                                        0.9997 0.8978
--     233-   254    226976                                                                        0.9997 0.8994
--     255-   277    204317                                                                        0.9997 0.9009
--     278-   301    192155                                                                        0.9998 0.9023
--     302-   326    190709                                                                        0.9998 0.9038
--     327-   352    181444                                                                        0.9998 0.9054
--     353-   379    162365                                                                        0.9998 0.9071
--     380-   407    145644                                                                        0.9998 0.9086
--     408-   436    123363                                                                        0.9999 0.9102
--     437-   466    101838                                                                        0.9999 0.9115
--     467-   497     82373                                                                        0.9999 0.9128
--     498-   529     67927                                                                        0.9999 0.9138
--     530-   562     58265                                                                        0.9999 0.9148
--     563-   596     50511                                                                        0.9999 0.9156
--     597-   631     46089                                                                        0.9999 0.9164
--     632-   667     42550                                                                        0.9999 0.9171
--     668-   704     39382                                                                        0.9999 0.9179
--     705-   742     36392                                                                        0.9999 0.9186
--     743-   781     34240                                                                        0.9999 0.9193
--     782-   821     31614                                                                        0.9999 0.9200
--
--     5149667 (max occurrences)
-- 31294449070 (total mers, non-unique)
--  3309130311 (distinct mers, non-unique)
--  6048307719 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0850    (use overlaps at or below this fraction error)
--        1    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        1    (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--  11490205 reads  37458078739 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--  3996265 reads  30665075792 bases (trimmed reads output)
--  1247441 reads   6124292353 bases (reads with no change, kept as is)
--  6075094 reads     59462210 bases (reads with no overlaps, deleted)
--  171405 reads    139777766 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--  2875390 reads    264226875 bases (bases trimmed from the 5' end of a read)
--  2776968 reads    205243743 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0850    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--  5243706 reads  37258838763 bases (reads processed)
--  6246499 reads    199239976 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--     893 reads      2176941 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--  5242813 reads  37256661822 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--    6722 reads         6785 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--    6785 reads      1985233 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--    3408 reads     11564919 bases (trimmed from the 5' end of the read)
--    3314 reads     10857870 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In gatekeeper store './Mammalian.gkpStore':
--   Found 5243642 reads.
--   Found 36766887993 bases (15.31 times coverage).
--
--   Read length histogram (one '*' equals 12513.48 reads):
--        0    999      0 
--     1000   1999 875944 **********************************************************************
--     2000   2999 509833 ****************************************
--     3000   3999 491877 ***************************************
--     4000   4999 452389 ************************************
--     5000   5999 407844 ********************************
--     6000   6999 363931 *****************************
--     7000   7999 321972 *************************
--     8000   8999 282300 **********************
--     9000   9999 246099 *******************
--    10000  10999 213852 *****************
--    11000  11999 188337 ***************
--    12000  12999 168079 *************
--    13000  13999 155137 ************
--    14000  14999 134242 **********
--    15000  15999 106435 ********
--    16000  16999  79975 ******
--    17000  17999  59896 ****
--    18000  18999  44925 ***
--    19000  19999  34324 **
--    20000  20999  26173 **
--    21000  21999  19784 *
--    22000  22999  14889 *
--    23000  23999  11205 
--    24000  24999   8613 
--    25000  25999   6454 
--    26000  26999   4937 
--    27000  27999   3499 
--    28000  28999   2821 
--    29000  29999   2150 
--    30000  30999   1500 
--    31000  31999   1087 
--    32000  32999    847 
--    33000  33999    638 
--    34000  34999    469 
--    35000  35999    348 
--    36000  36999    235 
--    37000  37999    168 
--    38000  38999    100 
--    39000  39999    105 
--    40000  40999     59 
--    41000  41999     50 
--    42000  42999     36 
--    43000  43999     32 
--    44000  44999     18 
--    45000  45999      8 
--    46000  46999      3 
--    47000  47999      3 
--    48000  48999     10 
--    49000  49999      5 
--    50000  50999      2 
--    51000  51999      1 
--    52000  52999      0 
--    53000  53999      1 
--    54000  54999      0 
--    55000  55999      1

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 5739074035 *******************************************************************--> 0.6368 0.1566
--       2-     2 847948386 ********************************************************************** 0.7309 0.2028
--       3-     4 570356470 ***********************************************                        0.7700 0.2317
--       5-     7 457571479 *************************************                                  0.8129 0.2785
--       8-    11 513715362 ******************************************                             0.8599 0.3590
--      12-    16 482241975 ***************************************                                0.9146 0.4995
--      17-    22 275367226 **********************                                                 0.9629 0.6749
--      23-    29  84924746 *******                                                                0.9884 0.8002
--      30-    37  18766823 *                                                                      0.9959 0.8480
--      38-    46   6802808                                                                        0.9976 0.8623
--      47-    56   3791867                                                                        0.9983 0.8695
--      57-    67   2409995                                                                        0.9987 0.8746
--      68-    79   1670333                                                                        0.9990 0.8785
--      80-    92   1256046                                                                        0.9991 0.8818
--      93-   106    970522                                                                        0.9993 0.8846
--     107-   121    760706                                                                        0.9994 0.8872
--     122-   137    608804                                                                        0.9995 0.8896
--     138-   154    493284                                                                        0.9995 0.8917
--     155-   172    407801                                                                        0.9996 0.8936
--     173-   191    350347                                                                        0.9996 0.8954
--     192-   211    301713                                                                        0.9997 0.8972
--     212-   232    257882                                                                        0.9997 0.8988
--     233-   254    223350                                                                        0.9997 0.9004
--     255-   277    201261                                                                        0.9997 0.9018
--     278-   301    190014                                                                        0.9998 0.9033
--     302-   326    188963                                                                        0.9998 0.9048
--     327-   352    179367                                                                        0.9998 0.9064
--     353-   379    160100                                                                        0.9998 0.9081
--     380-   407    142875                                                                        0.9998 0.9097
--     408-   436    120988                                                                        0.9999 0.9112
--     437-   466     98914                                                                        0.9999 0.9126
--     467-   497     80090                                                                        0.9999 0.9138
--     498-   529     66072                                                                        0.9999 0.9148
--     530-   562     56631                                                                        0.9999 0.9158
--     563-   596     49626                                                                        0.9999 0.9166
--     597-   631     45062                                                                        0.9999 0.9174
--     632-   667     41836                                                                        0.9999 0.9181
--     668-   704     38392                                                                        0.9999 0.9189
--     705-   742     36125                                                                        0.9999 0.9196
--     743-   781     33444                                                                        0.9999 0.9203
--     782-   821     31140                                                                        0.9999 0.9210
--
--     5091318 (max occurrences)
-- 30917697476 (total mers, non-unique)
--  3273518281 (distinct mers, non-unique)
--  5739074035 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing        473    0.01     8676.24 +- 5299.39        535.62 +- 635.25     (bad trimming)
--   middle-hump          1752    0.03     4686.74 +- 3094.16        230.69 +- 514.44     (bad trimming)
--   no-5-prime          18604    0.35     9160.95 +- 4922.89         84.43 +- 288.66     (bad trimming)
--   no-3-prime          18110    0.35     9503.58 +- 4972.00         88.98 +- 307.76     (bad trimming)
--   
--   low-coverage        23325    0.44     2464.78 +- 1366.76          2.69 +- 1.03       (easy to assemble, potential for lower quality consensus)
--   unique            2761921   52.67     6473.35 +- 4408.88         14.87 +- 4.10       (easy to assemble, perfect, yay)
--   repeat-cont        637695   12.16     3052.35 +- 3107.19       1466.29 +- 2039.10    (potential for consensus errors, no impact on assembly)
--   repeat-dove          1550    0.03    20276.80 +- 7541.50        238.15 +- 477.17     (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat        897781   17.12    10287.97 +- 5682.30       3245.81 +- 3556.35    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont   710941   13.56     7024.82 +- 4176.08                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove   138009    2.63    14153.11 +- 5585.81                             (will end contigs, potential to misassemble)
--   uniq-anchor         32579    0.62     9586.92 +- 5157.32       3592.31 +- 4178.55    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      20214 sequences, total length 2301687314 bp (including 248 repeats of total length 3564519 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  987420 sequences, total length 4356905762 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10      606671           296   240210710
--     20      440591           767   480274843
--     30      345701          1385   720343186
--     40      274816          2166   960178770
--     50      215287          3150  1200088130
--     60      168691          4407  1440043627
--     70      124441          6063  1680073530
--     80       84161          8398  1920068841
--     90       41557         12358  2160031286
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      20214 sequences, total length 2308455763 bp (including 248 repeats of total length 3549312 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  987420 sequences, total length 4356892888 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10      609186           295   240441813
--     20      443198           763   480148192
--     30      347562          1378   720342095
--     40      276655          2154   960068665
--     50      216708          3132  1200152206
--     60      170121          4380  1440149466
--     70      125879          6020  1680067723
--     80       85401          8327  1920081328
--     90       43007         12201  2160000770
--

Canu version : Canu 1.7.1 OS : CentOS 6.5 SGE

P.S : To make matters more complex I have gotten ~7.1 X of nanopore data and now I have to run an hybrid assembly ? Any suggestions on that ?

Thanking You,

Regards, Harsh

skoren commented 6 years ago

30x is on the low end of coverage so for a mammal you'll likely be limited to 1-2mb NG50s (e.g. humans) unless you have very long (100kb+) reads. In your case, setting corMhapSensitivity=normal is actually lower sensitivity than the default for 30x, so that hurt the correction. I'll clarify the FAQ to note that you only need to set that parameter if you have more than 50x. There are a couple of options you can adjust. I'd suggest trying: corMinCoverage=0 corMhapSensitivity=high correctedErrorRate=0.105 if you want to add nanopore data, specify it as pacbio data (since the majority of your data is pacbio) and increase correctedErrorRate=0.12.

harsh-shukla commented 6 years ago

Hi Sergey,

Thank you so much for the quick reply. We are planning to further scaffold the draft using Hi-C (or Bionano) and I need at least 1Mb of NG50 for it to work further. As long as I get a minimum NG50 of 1Mb I am more or less Ok.

So finally I have ~33X of Pacbio and ~7X of nanopore (R9 1D ). Nanopore data is not very great and the read length distribution is equivalent to Pacbio (actually a little worse) and I don't know what will it do to my assembly. When you say that I should give the nanopore data as pacbio data do you mean while running the correction I should specify it like this

-pacbio-raw <PACBIO_DATA> & <NANOPORE_DATA> (combined files together)

If yes , Do I have to change the rawErrorRate to somewhere in between 0.3 and 0.5 or let it be 0.3?

Regards, Harsh

skoren commented 6 years ago

Yes, that is what I meant. Leave the rawErrorRate, you should have enough pacbio data to get a good consensus with the default of 0.3.

harsh-shukla commented 6 years ago

Hey Sergey,

Thank you so much for the suggestions. I'll try running the Pacbio-only assembly and hybrid assembly with modified parameters.

I'll post here as soon as I have the stats of the new runs.

With Regards, Harsh

harsh-shukla commented 6 years ago

Hey Sergey ,

Hi again. So I am running out of disk space very rapidly and the admin is not happy at all. Referencing this issue #1039 because of low depth and higher sensitivity my Overlap Store building is increasing like crazy in size.

The current size of 1-overlapper/results/ is 2.8 TB.

I am currently running the bucketizer . My question is once a particular bucket is created can I delete the corresponding .ovb and .counts file from 1-overlapper/results/. For example once bucket0001/ folder is created (the corresponding job is done) can I delete 000001.counts and 000001.ovb ? Also should I delete the 1-overlapper/blocks/ folder now itself.

Can I bucket and sort few at a time and delete the .ovb file once it is sorted? Is there any way to do that ?

Regards, Harsh

skoren commented 6 years ago

It is definitely safe to erase the blocks folder in 1-overlapper directory. It is mostly safe to erase the results files, yes. However, if you have any disk corruption during the store construction you wouldn't be able to re-run a bucketizing step which is why Canu usually doesn't erase these files until the end of store building.

harsh-shukla commented 6 years ago

Hey Sergey,

Thanks for the quick answer.

One more thing can I shift the entire assembly (whole folder) from one SGE cluster to another and continue the run. The sys-admin has agreed to mount a hard drive on an another SGE cluster. They have the same version of canu build (from source) (Canu 1.7.1) It should be fine I guess

Regards, Harsh

skoren commented 6 years ago

The assembly uses all local paths so moving to a different HD/folder is OK.

harsh-shukla commented 6 years ago

Hi again Sergey,

So the Pacbio only run got over today. I am getting a way better assembly now. NG50 is almost ~1MB The parameters used were as suggested

 corMinCoverage=0 corMhapSensitivity=high correctedErrorRate=0.105

But it seems the Error Rate for the corrected reads is quite high.

unitigging/4-*/001thr000.num000.log file

INITIAL EDGES
-------- ----------------------------------------
 5881886 reads are contained
 5030208 reads have no best edges (singleton)
   16710 reads have only one best edge (spur) 
            12550 are mutual best
  561401 reads have two best edges 
             5728 have one mutual best edge
           553503 have two mutual best edges

ERROR RATES (9578612 samples)
-----------
mean   0.03894592 stddev 0.01776539 -> 0.14553824 fraction error =  14.553824% error
median 0.03670000 mad    0.01200000 -> 0.14344720 fraction error =  14.344720% error

EDGE FILTERING
-------- ------------------------------------------
 5035130 reads have a suspicious overlap pattern
       0 reads had edges filtered
                0 had one
                0 had two
    7104 reads have length incompatible edges
             5858 have one
             1246 have two

FINAL EDGES
-------- ----------------------------------------
 5881886 reads are contained
 5035232 reads have no best edges (singleton)
   24280 reads have only one best edge (spur) 
             9231 are mutual best
  548807 reads have two best edges 
             1749 have one mutual best edge
           541937 have two mutual best edges

Also genome scope output from both trimming and unitigging step

From trimming/0-merscounts 

GenomeScope version 1.0
k = 22

property                      min               max               
Heterozygosity                1.88872%          2.05744%          
Genome Haploid Length         1,890,660,623 bp  1,909,029,015 bp  
Genome Repeat Length          253,531,721 bp    255,994,865 bp    
Genome Unique Length          1,637,128,902 bp  1,653,034,150 bp  
Model Fit                     94.4821%          95.6423%          
Read Error Rate               3.67823%          3.67823%

----------------------------------------------------------------------

From unitigging/0-mercounts

GenomeScope version 1.0
k = 22

property                      min               max               
Heterozygosity                1.91341%          2.07094%          
Genome Haploid Length         1,785,642,436 bp  1,801,601,988 bp  
Genome Repeat Length          198,541,059 bp    200,315,561 bp    
Genome Unique Length          1,587,101,377 bp  1,601,286,428 bp  
Model Fit                     93.9694%          94.9344%          
Read Error Rate               1.68581%          1.68581%         

Should I run trimming and assembly step again with increased correctedErrorRate (~0.14) or do one more round of correction? Does increasing the correctedErrorRate correlate with having more chances of mis-assemblies?

Trimming step reduces the coverage to ~19.5X. Will it be enough to cover the entire genome ?

Also attached is the .report file

[CORRECTION/READS]
--
-- In gatekeeper store './Mammalian_Pacbio.gkpStore':
--   Found 11490205 reads.
--   Found 78182336134 bases (32.57 times coverage).
--
--   Read length histogram (one '*' equals 27920.97 reads):
--        0    999      0 
--     1000   1999 1954468 **********************************************************************
--     2000   2999 1742867 **************************************************************
--     3000   3999 1124831 ****************************************
--     4000   4999 943905 *********************************
--     5000   5999 796367 ****************************
--     6000   6999 683089 ************************
--     7000   7999 592761 *********************
--     8000   8999 514032 ******************
--     9000   9999 447533 ****************
--    10000  10999 390795 *************
--    11000  11999 342092 ************
--    12000  12999 304283 **********
--    13000  13999 280427 **********
--    14000  14999 260830 *********
--    15000  15999 222907 *******
--    16000  16999 180278 ******
--    17000  17999 142807 *****
--    18000  18999 113807 ****
--    19000  19999  90561 ***
--    20000  20999  72916 **
--    21000  21999  57992 **
--    22000  22999  45916 *
--    23000  23999  36962 *
--    24000  24999  29471 *
--    25000  25999  23644 
--    26000  26999  19142 
--    27000  27999  15273 
--    28000  28999  12253 
--    29000  29999   9603 
--    30000  30999   8006 
--    31000  31999   6266 
--    32000  32999   4961 
--    33000  33999   4009 
--    34000  34999   3172 
--    35000  35999   2457 
--    36000  36999   2093 
--    37000  37999   1609 
--    38000  38999   1151 
--    39000  39999   1008 
--    40000  40999    748 
--    41000  41999    546 
--    42000  42999    491 
--    43000  43999    375 
--    44000  44999    303 
--    45000  45999    237 
--    46000  46999    204 
--    47000  47999    153 
--    48000  48999    119 
--    49000  49999     90 
--    50000  50999     76 
--    51000  51999     65 
--    52000  52999     36 
--    53000  53999     47 
--    54000  54999     39 
--    55000  55999     21 
--    56000  56999     16 
--    57000  57999     17 
--    58000  58999     15 
--    59000  59999      9 
--    60000  60999      9 
--    61000  61999      7 
--    62000  62999      5 
--    63000  63999      4 
--    64000  64999      6 
--    65000  65999      0 
--    66000  66999      2 
--    67000  67999      2 
--    68000  68999      0 
--    69000  69999      4 
--    70000  70999      2 
--    71000  71999      1 
--    72000  72999      3 
--    73000  73999      1 
--    74000  74999      1 
--    75000  75999      0 
--    76000  76999      3 
--    77000  77999      0 
--    78000  78999      1 
--    79000  79999      0 
--    80000  80999      1 
--    81000  81999      0 
--    82000  82999      2

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1  18717930 ****                                                                   0.0087 0.0002
--       2-     2  35006604 ********                                                               0.0251 0.0011
--       3-     4 110744625 ***************************                                            0.0484 0.0031
--       5-     7 211775545 ****************************************************                   0.1084 0.0105
--       8-    11 279430070 ********************************************************************   0.2095 0.0300
--      12-    16 284235143 ********************************************************************** 0.3358 0.0662
--      17-    22 242137163 ***********************************************************            0.4608 0.1172
--      23-    29 199663035 *************************************************                      0.5672 0.1766
--      30-    37 169596963 *****************************************                              0.6563 0.2423
--      38-    46 142408818 ***********************************                                    0.7327 0.3143
--      47-    56 114018159 ****************************                                           0.7970 0.3897
--      57-    67  86731089 *********************                                                  0.8484 0.4634
--      68-    79  63641811 ***************                                                        0.8875 0.5307
--      80-    92  45944253 ***********                                                            0.9162 0.5891
--      93-   106  33110321 ********                                                               0.9370 0.6385
--     107-   121  24008875 *****                                                                  0.9520 0.6798
--     122-   137  17574997 ****                                                                   0.9630 0.7142
--     138-   154  13027826 ***                                                                    0.9710 0.7428
--     155-   172   9791825 **                                                                     0.9769 0.7668
--     173-   191   7471723 *                                                                      0.9814 0.7870
--     192-   211   5788827 *                                                                      0.9849 0.8042
--     212-   232   4535908 *                                                                      0.9875 0.8189
--     233-   254   3589193                                                                        0.9896 0.8317
--     255-   277   2869251                                                                        0.9913 0.8428
--     278-   301   2307350                                                                        0.9926 0.8525
--     302-   326   1870718                                                                        0.9937 0.8610
--     327-   352   1531909                                                                        0.9945 0.8684
--     353-   379   1267164                                                                        0.9952 0.8750
--     380-   407   1059297                                                                        0.9958 0.8810
--     408-   436    895836                                                                        0.9963 0.8863
--     437-   466    764283                                                                        0.9967 0.8911
--     467-   497    657849                                                                        0.9971 0.8955
--     498-   529    570953                                                                        0.9974 0.8995
--     530-   562    499040                                                                        0.9976 0.9033
--     563-   596    438403                                                                        0.9979 0.9068
--     597-   631    385738                                                                        0.9981 0.9100
--     632-   667    340814                                                                        0.9983 0.9130
--     668-   704    300442                                                                        0.9984 0.9158
--     705-   742    264660                                                                        0.9986 0.9185
--     743-   781    233762                                                                        0.9987 0.9209
--     782-   821    207489                                                                        0.9988 0.9232
--
--    41247321 (max occurrences)
-- 77991265129 (total mers, non-unique)
--  2123085280 (distinct mers, non-unique)
--    18717930 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads           11436831         53374
--   Number of Bases        77921492781     260591354
--   Coverage                    32.467         0.109
--   Median                        4986          1784
--   Mean                          6813          4882
--   N50                          10464         10484
--   Minimum                       1000             0
--   Maximum                      82685         82287
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads           11475662       11436831      11436831              0             0
--   Number of Bases        78078052338    77921492781   73711028550              0             0
--   Coverage                    32.533         32.467        30.713          0.000         0.000
--   Median                        4973           4986          4555              0             0
--   Mean                          6803           6813          6445              0             0
--   N50                          10465          10464         10457              0             0
--   Minimum                       1000           1000             1              0             0
--   Maximum                      82685          82685         82684              0             0
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads              53374         53374
--   Number of Bases          260591354             0
--   Coverage                     0.109         0.000
--   Median                        1784             0
--   Mean                          4882             0
--   N50                          10484             0
--   Minimum                          0             0
--   Maximum                      82287             0
--   
--   Maximum Memory          1564656452

[TRIMMING/READS]
--
-- In gatekeeper store './Mammalian_Pacbio.gkpStore':
--   Found 11436065 reads.
--   Found 76371559322 bases (31.82 times coverage).
--
--   Read length histogram (one '*' equals 31545.25 reads):
--        0    999   6124 
--     1000   1999 2208168 **********************************************************************
--     2000   2999 1480468 **********************************************
--     3000   3999 1143606 ************************************
--     4000   4999 957214 ******************************
--     5000   5999 807647 *************************
--     6000   6999 690896 *********************
--     7000   7999 596407 ******************
--     8000   8999 514573 ****************
--     9000   9999 446643 **************
--    10000  10999 386965 ************
--    11000  11999 338600 **********
--    12000  12999 300727 *********
--    13000  13999 279649 ********
--    14000  14999 255368 ********
--    15000  15999 211644 ******
--    16000  16999 167363 *****
--    17000  17999 131919 ****
--    18000  18999 103875 ***
--    19000  19999  83435 **
--    20000  20999  66434 **
--    21000  21999  52610 *
--    22000  22999  41688 *
--    23000  23999  33282 *
--    24000  24999  26462 
--    25000  25999  21151 
--    26000  26999  17082 
--    27000  27999  13600 
--    28000  28999  10662 
--    29000  29999   8597 
--    30000  30999   6929 
--    31000  31999   5518 
--    32000  32999   4260 
--    33000  33999   3478 
--    34000  34999   2661 
--    35000  35999   2230 
--    36000  36999   1811 
--    37000  37999   1295 
--    38000  38999   1022 
--    39000  39999    846 
--    40000  40999    631 
--    41000  41999    489 
--    42000  42999    396 
--    43000  43999    332 
--    44000  44999    274 
--    45000  45999    205 
--    46000  46999    173 
--    47000  47999    118 
--    48000  48999    119 
--    49000  49999     74 
--    50000  50999     72 
--    51000  51999     53 
--    52000  52999     31 
--    53000  53999     34 
--    54000  54999     33 
--    55000  55999     22 
--    56000  56999     12 
--    57000  57999     18 
--    58000  58999     12 
--    59000  59999      9 
--    60000  60999     10 
--    61000  61999      6 
--    62000  62999      5 
--    63000  63999      2 
--    64000  64999      4 
--    65000  65999      1 
--    66000  66999      1 
--    67000  67999      3 
--    68000  68999      1 
--    69000  69999      2 
--    70000  70999      2 
--    71000  71999      2 
--    72000  72999      2 
--    73000  73999      1 
--    74000  74999      1 
--    75000  75999      0 
--    76000  76999      3 
--    77000  77999      0 
--    78000  78999      1 
--    79000  79999      0 
--    80000  80999      1 
--    81000  81999      1

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 34540413185 *******************************************************************--> 0.8788 0.4537
--       2-     2 1736029254 ********************************************************************** 0.9230 0.4993
--       3-     4 834586412 *********************************                                      0.9369 0.5209
--       5-     7 486650119 *******************                                                    0.9491 0.5486
--       8-    11 511641343 ********************                                                   0.9599 0.5874
--      12-    16 558145949 **********************                                                 0.9727 0.6565
--      17-    22 407568324 ****************                                                       0.9861 0.7593
--      23-    29 159227228 ******                                                                 0.9951 0.8532
--      30-    37  36142600 *                                                                      0.9984 0.8975
--      38-    46  10956320                                                                        0.9992 0.9105
--      47-    56   5775306                                                                        0.9994 0.9160
--      57-    67   3625218                                                                        0.9996 0.9197
--      68-    79   2526916                                                                        0.9997 0.9226
--      80-    92   1880777                                                                        0.9997 0.9249
--      93-   106   1429240                                                                        0.9998 0.9270
--     107-   121   1113387                                                                        0.9998 0.9289
--     122-   137    890347                                                                        0.9998 0.9305
--     138-   154    717895                                                                        0.9998 0.9320
--     155-   172    591125                                                                        0.9999 0.9334
--     173-   191    499011                                                                        0.9999 0.9346
--     192-   211    426924                                                                        0.9999 0.9358
--     212-   232    360919                                                                        0.9999 0.9369
--     233-   254    314198                                                                        0.9999 0.9380
--     255-   277    278484                                                                        0.9999 0.9390
--     278-   301    258414                                                                        0.9999 0.9399
--     302-   326    244988                                                                        0.9999 0.9409
--     327-   352    228070                                                                        0.9999 0.9419
--     353-   379    202048                                                                        0.9999 0.9429
--     380-   407    175623                                                                        1.0000 0.9439
--     408-   436    150391                                                                        1.0000 0.9448
--     437-   466    126848                                                                        1.0000 0.9456
--     467-   497    103873                                                                        1.0000 0.9464
--     498-   529     88723                                                                        1.0000 0.9470
--     530-   562     76500                                                                        1.0000 0.9476
--     563-   596     68606                                                                        1.0000 0.9482
--     597-   631     61966                                                                        1.0000 0.9487
--     632-   667     57131                                                                        1.0000 0.9492
--     668-   704     53044                                                                        1.0000 0.9497
--     705-   742     49517                                                                        1.0000 0.9502
--     743-   781     45892                                                                        1.0000 0.9506
--     782-   821     42050                                                                        1.0000 0.9511
--
--    14412852 (max occurrences)
-- 41590988772 (total mers, non-unique)
--  4764133149 (distinct mers, non-unique)
-- 34540413185 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.1050    (use overlaps at or below this fraction error)
--        1    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        1    (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--  11490205 reads  76371559322 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--  6086089 reads  45271880153 bases (trimmed reads output)
--  385372 reads   1120547846 bases (reads with no change, kept as is)
--  4718311 reads  20065105740 bases (reads with no overlaps, deleted)
--  300433 reads   1237223274 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--  5525550 reads   4882431543 bases (bases trimmed from the 5' end of a read)
--  5741203 reads   3794370766 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.1050    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--  6471461 reads  55069230308 bases (reads processed)
--  5018744 reads  21302329014 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--     313 reads      2668318 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--  6471148 reads  55066561990 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--   10964 reads        11054 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--   11054 reads      3057332 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--    5652 reads     22580085 bases (trimmed from the 5' end of the read)
--    5312 reads     20270473 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In gatekeeper store './Mammalian_Pacbio.gkpStore':
--   Found 6471405 reads.
--   Found 46349528979 bases (19.31 times coverage).
--
--   Read length histogram (one '*' equals 16408.67 reads):
--        0    999      0 
--     1000   1999 1148607 **********************************************************************
--     2000   2999 618052 *************************************
--     3000   3999 579389 ***********************************
--     4000   4999 530249 ********************************
--     5000   5999 478191 *****************************
--     6000   6999 429045 **************************
--     7000   7999 381771 ***********************
--     8000   8999 337290 ********************
--     9000   9999 295867 ******************
--    10000  10999 259201 ***************
--    11000  11999 230005 **************
--    12000  12999 207897 ************
--    13000  13999 195370 ***********
--    14000  14999 173163 **********
--    15000  15999 139692 ********
--    16000  16999 106313 ******
--    17000  17999  80866 ****
--    18000  18999  62883 ***
--    19000  19999  49095 **
--    20000  20999  37911 **
--    21000  21999  29614 *
--    22000  22999  22874 *
--    23000  23999  17684 *
--    24000  24999  13658 
--    25000  25999  10832 
--    26000  26999   8240 
--    27000  27999   6368 
--    28000  28999   4963 
--    29000  29999   3923 
--    30000  30999   2969 
--    31000  31999   2284 
--    32000  32999   1711 
--    33000  33999   1368 
--    34000  34999    989 
--    35000  35999    770 
--    36000  36999    569 
--    37000  37999    443 
--    38000  38999    296 
--    39000  39999    244 
--    40000  40999    171 
--    41000  41999    137 
--    42000  42999    113 
--    43000  43999     82 
--    44000  44999     78 
--    45000  45999     36 
--    46000  46999     28 
--    47000  47999     29 
--    48000  48999     17 
--    49000  49999     21 
--    50000  50999     14 
--    51000  51999      9 
--    52000  52999      2 
--    53000  53999      3 
--    54000  54999      3 
--    55000  55999      1 
--    56000  56999      2 
--    57000  57999      1 
--    58000  58999      0 
--    59000  59999      0 
--    60000  60999      1 
--    61000  61999      0 
--    62000  62999      0 
--    63000  63999      0 
--    64000  64999      0 
--    65000  65999      0 
--    66000  66999      0 
--    67000  67999      0 
--    68000  68999      1

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 9303977321 *******************************************************************--> 0.7091 0.2013
--       2-     2 1103080911 ********************************************************************** 0.7932 0.2491
--       3-     4 652465959 *****************************************                              0.8250 0.2762
--       5-     7 441262704 ****************************                                           0.8557 0.3148
--       8-    11 495849652 *******************************                                        0.8861 0.3748
--      12-    16 539549626 **********************************                                     0.9234 0.4856
--      17-    22 383623386 ************************                                               0.9620 0.6486
--      23-    29 143405603 *********                                                              0.9874 0.7931
--      30-    37  30688049 *                                                                      0.9962 0.8583
--      38-    46   8944266                                                                        0.9980 0.8763
--      47-    56   4585097                                                                        0.9987 0.8837
--      57-    67   2823931                                                                        0.9990 0.8885
--      68-    79   1951045                                                                        0.9992 0.8921
--      80-    92   1455250                                                                        0.9993 0.8952
--      93-   106   1104407                                                                        0.9994 0.8978
--     107-   121    861315                                                                        0.9995 0.9002
--     122-   137    689514                                                                        0.9996 0.9022
--     138-   154    555965                                                                        0.9996 0.9042
--     155-   172    459241                                                                        0.9997 0.9059
--     173-   191    393055                                                                        0.9997 0.9075
--     192-   211    335654                                                                        0.9997 0.9090
--     212-   232    284798                                                                        0.9998 0.9105
--     233-   254    251336                                                                        0.9998 0.9119
--     255-   277    226470                                                                        0.9998 0.9132
--     278-   301    218647                                                                        0.9998 0.9145
--     302-   326    210327                                                                        0.9998 0.9158
--     327-   352    191030                                                                        0.9999 0.9173
--     353-   379    167893                                                                        0.9999 0.9187
--     380-   407    144379                                                                        0.9999 0.9200
--     408-   436    122588                                                                        0.9999 0.9212
--     437-   466    100208                                                                        0.9999 0.9223
--     467-   497     81439                                                                        0.9999 0.9233
--     498-   529     69683                                                                        0.9999 0.9241
--     530-   562     61274                                                                        0.9999 0.9249
--     563-   596     55148                                                                        0.9999 0.9256
--     597-   631     50220                                                                        0.9999 0.9263
--     632-   667     46584                                                                        0.9999 0.9270
--     668-   704     42830                                                                        0.9999 0.9276
--     705-   742     39435                                                                        0.9999 0.9283
--     743-   781     36679                                                                        0.9999 0.9289
--     782-   821     33593                                                                        1.0000 0.9295
--
--     5392091 (max occurrences)
-- 36909652153 (total mers, non-unique)
--  3817122533 (distinct mers, non-unique)
--  9303977321 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing        686    0.01     8406.37 +- 5405.71        432.12 +- 567.36     (bad trimming)
--   middle-hump           426    0.01     4605.99 +- 3689.24        342.64 +- 602.39     (bad trimming)
--   no-5-prime           4801    0.07     9663.32 +- 5720.04        137.64 +- 369.39     (bad trimming)
--   no-3-prime           5277    0.08     9961.37 +- 5688.19        124.12 +- 356.42     (bad trimming)
--   
--   low-coverage        80372    1.24     2384.70 +- 1592.21          4.22 +- 1.41       (easy to assemble, potential for lower quality consensus)
--   unique            4305320   66.53     7031.51 +- 4933.27         18.80 +- 4.96       (easy to assemble, perfect, yay)
--   repeat-cont        670091   10.35     2934.34 +- 3029.78       1910.17 +- 2444.87    (potential for consensus errors, no impact on assembly)
--   repeat-dove          1278    0.02    20670.38 +- 8747.33        383.43 +- 850.28     (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat        770822   11.91    10992.10 +- 6185.89       2775.83 +- 3184.81    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont   515450    7.97     7130.03 +- 4317.42                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove    96884    1.50    15585.48 +- 5598.40                             (will end contigs, potential to misassemble)
--   uniq-anchor         11097    0.17     9962.44 +- 5382.08       3552.85 +- 4105.11    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      6632 sequences, total length 2379969862 bp (including 270 repeats of total length 4432054 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  1401240 sequences, total length 6010030445 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     2801520            63   240924077
--     20     2089702           162   481245252
--     30     1500254           301   721287251
--     40     1198112           481   960368515
--     50      923668           712  1200715387
--     60      705818          1007  1440136006
--     70      530581          1397  1680235960
--     80      357585          1946  1920280140
--     90      192896          2842  2160006989
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      6632 sequences, total length 2381351738 bp (including 270 repeats of total length 4428786 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  1401240 sequences, total length 6010008168 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     2804872            63   241152016
--     20     2090941           162   481683023
--     30     1504549           300   720413349
--     40     1199942           481   961175766
--     50      924879           711  1200771306
--     60      706762          1006  1440558211
--     70      533202          1395  1680444157
--     80      359672          1942  1920238374
--     90      195117          2835  2160081719
--

Also regarding hybrid run (using 33X Pacbio and 7X Nanopore) currently my correction step is just now completed. Looking from the data above should i change correctedErrorRate=0.12 to something higher ?

Thanking You,

With Regards and Your's Sincerely, Harsh

skoren commented 6 years ago

For 33X, I think you're doing ok with a 1MB NG50. You end up with 15x of total bases after trimming which is low but probably the minimum to get a decent assembly. For the hybrid, I'd say using 0.12 is OK, it should hopefully leave you with 20x+ coverage after trimming which would improve the assembly.

harsh-shukla commented 6 years ago

Thanks sergey for all the help , I'll post the stats of the hybrid run as soon as it is over

harsh-shukla commented 6 years ago

Hey Sergey,

So the hybrid run got over yesterday. It is a a much improved assembly over Pacbio-only. The NG50 is ~3.27 Mb which is quite good. After the trimming step around 25X coverage was left, which I am hoping was good enough to cover the entire genome. Entire 2.4 Gb is covered in 2766 total contigs which can be scaffolded to chromosomal level most likely. Lets see what happens.

I am attaching the report file as well the BUSCO stats for the hybrid assembly.

[CORRECTION/READS]
--
-- In gatekeeper store './MammalianHybrid.gkpStore':
--   Found 14672377 reads.
--   Found 95236354580 bases (39.68 times coverage).
--
--   Read length histogram (one '*' equals 41786.5 reads):
--        0    999      0 
--     1000   1999 2925055 **********************************************************************
--     2000   2999 2277601 ******************************************************
--     3000   3999 1380668 *********************************
--     4000   4999 1194684 ****************************
--     5000   5999 989800 ***********************
--     6000   6999 838688 ********************
--     7000   7999 722707 *****************
--     8000   8999 624790 **************
--     9000   9999 542690 ************
--    10000  10999 472620 ***********
--    11000  11999 411580 *********
--    12000  12999 362712 ********
--    13000  13999 328994 *******
--    14000  14999 301592 *******
--    15000  15999 256455 ******
--    16000  16999 207704 ****
--    17000  17999 165323 ***
--    18000  18999 131907 ***
--    19000  19999 105471 **
--    20000  20999  84897 **
--    21000  21999  67813 *
--    22000  22999  53805 *
--    23000  23999  43191 *
--    24000  24999  34691 
--    25000  25999  27907 
--    26000  26999  22708 
--    27000  27999  18319 
--    28000  28999  14704 
--    29000  29999  11685 
--    30000  30999   9740 
--    31000  31999   7690 
--    32000  32999   6282 
--    33000  33999   5125 
--    34000  34999   4168 
--    35000  35999   3251 
--    36000  36999   2823 
--    37000  37999   2224 
--    38000  38999   1723 
--    39000  39999   1526 
--    40000  40999   1133 
--    41000  41999    915 
--    42000  42999    842 
--    43000  43999    646 
--    44000  44999    603 
--    45000  45999    472 
--    46000  46999    372 
--    47000  47999    308 
--    48000  48999    266 
--    49000  49999    229 
--    50000  50999    208 
--    51000  51999    168 
--    52000  52999    122 
--    53000  53999    114 
--    54000  54999    107 
--    55000  55999     80 
--    56000  56999     60 
--    57000  57999     72 
--    58000  58999     49 
--    59000  59999     42 
--    60000  60999     35 
--    61000  61999     32 
--    62000  62999     16 
--    63000  63999     22 
--    64000  64999     23 
--    65000  65999     17 
--    66000  66999     20 
--    67000  67999     11 
--    68000  68999      7 
--    69000  69999     11 
--    70000  70999     11 
--    71000  71999      6 
--    72000  72999     10 
--    73000  73999      6 
--    74000  74999      3 
--    75000  75999      3 
--    76000  76999      5 
--    77000  77999      1 
--    78000  78999      2 
--    79000  79999      2 
--    80000  80999      1 
--    81000  81999      0 
--    82000  82999      3 
--    83000  83999      1 
--    84000  84999      0 
--    85000  85999      0 
--    86000  86999      1 
--    87000  87999      1 
--    88000  88999      1 
--    89000  89999      1 
--    90000  90999      0 
--    91000  91999      0 
--    92000  92999      0 
--    93000  93999      0 
--    94000  94999      1 
--    95000  95999      0 
--    96000  96999      0 
--    97000  97999      1 
--    98000  98999      1 
--    99000  99999      0 
--   100000 100999      1

[CORRECTION/MERS]
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1  14414041 ***                                                                    0.0067 0.0002
--       2-     2  28076792 *******                                                                0.0198 0.0007
--       3-     4  93107255 ************************                                               0.0391 0.0020
--       5-     7 185834678 ************************************************                       0.0907 0.0073
--       8-    11 253441217 *****************************************************************      0.1802 0.0215
--      12-    16 269315286 ********************************************************************** 0.2958 0.0488
--      17-    22 235321302 *************************************************************          0.4151 0.0888
--      23-    29 190222088 *************************************************                      0.5184 0.1362
--      30-    37 159558664 *****************************************                              0.6029 0.1873
--      38-    46 139461582 ************************************                                   0.6749 0.2431
--      47-    56 120887820 *******************************                                        0.7382 0.3042
--      57-    67 100885180 **************************                                             0.7932 0.3691
--      68-    79  80792550 ********************                                                   0.8390 0.4339
--      80-    92  62673074 ****************                                                       0.8757 0.4953
--      93-   106  47651817 ************                                                           0.9042 0.5510
--     107-   121  35871557 *********                                                              0.9259 0.5999
--     122-   137  26947996 *******                                                                0.9422 0.6421
--     138-   154  20313617 *****                                                                  0.9545 0.6782
--     155-   172  15438313 ****                                                                   0.9638 0.7089
--     173-   191  11830713 ***                                                                    0.9709 0.7351
--     192-   211   9164680 **                                                                     0.9763 0.7575
--     212-   232   7184839 *                                                                      0.9805 0.7767
--     233-   254   5693151 *                                                                      0.9838 0.7933
--     255-   277   4559159 *                                                                      0.9865 0.8077
--     278-   301   3679806                                                                        0.9886 0.8204
--     302-   326   2995938                                                                        0.9903 0.8315
--     327-   352   2448942                                                                        0.9917 0.8413
--     353-   379   2019631                                                                        0.9928 0.8500
--     380-   407   1675761                                                                        0.9937 0.8577
--     408-   436   1402500                                                                        0.9945 0.8646
--     437-   466   1183979                                                                        0.9951 0.8708
--     467-   497   1008926                                                                        0.9957 0.8764
--     498-   529    865336                                                                        0.9962 0.8815
--     530-   562    747902                                                                        0.9966 0.8861
--     563-   596    649837                                                                        0.9969 0.8904
--     597-   631    569865                                                                        0.9972 0.8943
--     632-   667    502245                                                                        0.9975 0.8980
--     668-   704    443762                                                                        0.9977 0.9014
--     705-   742    392517                                                                        0.9979 0.9046
--     743-   781    347755                                                                        0.9981 0.9076
--     782-   821    309730                                                                        0.9983 0.9104
--
--    45894597 (max occurrences)
-- 95001854884 (total mers, non-unique)
--  2128910838 (distinct mers, non-unique)
--    14414041 (unique mers)

[CORRECTION/LAYOUT]
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads           14581003         91374
--   Number of Bases        94868798469     367392325
--   Coverage                    39.529         0.153
--   Median                        4624          1709
--   Mean                          6506          4020
--   N50                          10172          8614
--   Minimum                       1000             0
--   Maximum                     100428         82287
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads           14656147       14581003      14581003              0             0
--   Number of Bases        95139191113    94868798469   90047929203              0             0
--   Coverage                    39.641         39.529        37.520          0.000         0.000
--   Median                        4606           4624          4244              0             0
--   Mean                          6491           6506          6175              0             0
--   N50                          10166          10172         10190              0             0
--   Minimum                       1000           1000             1              0             0
--   Maximum                     100428         100428        100427              0             0
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads              91374         91374
--   Number of Bases          367392325             0
--   Coverage                     0.153         0.000
--   Median                        1709             0
--   Mean                          4020             0
--   N50                           8614             0
--   Minimum                          0             0
--   Maximum                      82287             0
--   
--   Maximum Memory          1787445848

[TRIMMING/READS]
--
-- In gatekeeper store './MammalianHybrid.gkpStore':
--   Found 14580025 reads.
--   Found 92999998465 bases (38.74 times coverage).
--
--   Read length histogram (one '*' equals 45209.94 reads):
--        0    999  11328 
--     1000   1999 3164696 **********************************************************************
--     2000   2999 2008466 ********************************************
--     3000   3999 1406712 *******************************
--     4000   4999 1206674 **************************
--     5000   5999 999568 **********************
--     6000   6999 845168 ******************
--     7000   7999 724516 ****************
--     8000   8999 623840 *************
--     9000   9999 540053 ***********
--    10000  10999 466946 **********
--    11000  11999 406263 ********
--    12000  12999 356651 *******
--    13000  13999 325901 *******
--    14000  14999 294421 ******
--    15000  15999 243614 *****
--    16000  16999 193897 ****
--    17000  17999 153071 ***
--    18000  18999 121149 **
--    19000  19999  97220 **
--    20000  20999  77867 *
--    21000  21999  61873 *
--    22000  22999  48871 *
--    23000  23999  39168 
--    24000  24999  31363 
--    25000  25999  25027 
--    26000  26999  20534 
--    27000  27999  16405 
--    28000  28999  12896 
--    29000  29999  10484 
--    30000  30999   8578 
--    31000  31999   6866 
--    32000  32999   5490 
--    33000  33999   4492 
--    34000  34999   3585 
--    35000  35999   2974 
--    36000  36999   2491 
--    37000  37999   1855 
--    38000  38999   1546 
--    39000  39999   1300 
--    40000  40999    992 
--    41000  41999    831 
--    42000  42999    701 
--    43000  43999    630 
--    44000  44999    499 
--    45000  45999    393 
--    46000  46999    321 
--    47000  47999    278 
--    48000  48999    249 
--    49000  49999    213 
--    50000  50999    169 
--    51000  51999    133 
--    52000  52999    121 
--    53000  53999     93 
--    54000  54999     87 
--    55000  55999     67 
--    56000  56999     59 
--    57000  57999     60 
--    58000  58999     48 
--    59000  59999     42 
--    60000  60999     24 
--    61000  61999     28 
--    62000  62999     19 
--    63000  63999     18 
--    64000  64999     17 
--    65000  65999     21 
--    66000  66999     14 
--    67000  67999     11 
--    68000  68999      6 
--    69000  69999     10 
--    70000  70999      6 
--    71000  71999      7 
--    72000  72999      8 
--    73000  73999      5 
--    74000  74999      3 
--    75000  75999      3 
--    76000  76999      3 
--    77000  77999      1 
--    78000  78999      3 
--    79000  79999      1 
--    80000  80999      1 
--    81000  81999      2 
--    82000  82999      1 
--    83000  83999      0 
--    84000  84999      0 
--    85000  85999      0 
--    86000  86999      3 
--    87000  87999      0 
--    88000  88999      0 
--    89000  89999      1 
--    90000  90999      0 
--    91000  91999      0 
--    92000  92999      0 
--    93000  93999      0 
--    94000  94999      1 
--    95000  95999      1 
--    96000  96999      1 
--    97000  97999      0 
--    98000  98999      0 
--    99000  99999      0 
--   100000 100999      1

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 37295971775 *******************************************************************--> 0.8800 0.4024
--       2-     2 1851812923 ********************************************************************** 0.9237 0.4423
--       3-     4 811296117 ******************************                                         0.9366 0.4600
--       5-     7 381101587 **************                                                         0.9466 0.4802
--       8-    11 356914325 *************                                                          0.9539 0.5032
--      12-    16 486788188 ******************                                                     0.9624 0.5440
--      17-    22 568327909 *********************                                                  0.9741 0.6242
--      23-    29 414740968 ***************                                                        0.9870 0.7444
--      30-    37 151525399 *****                                                                  0.9957 0.8498
--      38-    46  30788820 *                                                                      0.9986 0.8948
--      47-    56  10025838                                                                        0.9992 0.9064
--      57-    67   5546408                                                                        0.9994 0.9116
--      68-    79   3602331                                                                        0.9996 0.9151
--      80-    92   2572241                                                                        0.9997 0.9179
--      93-   106   1949976                                                                        0.9997 0.9202
--     107-   121   1528054                                                                        0.9998 0.9223
--     122-   137   1213195                                                                        0.9998 0.9241
--     138-   154    976805                                                                        0.9998 0.9258
--     155-   172    805010                                                                        0.9998 0.9273
--     173-   191    657225                                                                        0.9999 0.9287
--     192-   211    553177                                                                        0.9999 0.9300
--     212-   232    474531                                                                        0.9999 0.9312
--     233-   254    411152                                                                        0.9999 0.9323
--     255-   277    356597                                                                        0.9999 0.9334
--     278-   301    313552                                                                        0.9999 0.9344
--     302-   326    286003                                                                        0.9999 0.9354
--     327-   352    270811                                                                        0.9999 0.9364
--     353-   379    256903                                                                        0.9999 0.9374
--     380-   407    230971                                                                        0.9999 0.9384
--     408-   436    201744                                                                        0.9999 0.9394
--     437-   466    171887                                                                        1.0000 0.9403
--     467-   497    146859                                                                        1.0000 0.9411
--     498-   529    126316                                                                        1.0000 0.9419
--     530-   562    108979                                                                        1.0000 0.9426
--     563-   596     95295                                                                        1.0000 0.9432
--     597-   631     84597                                                                        1.0000 0.9438
--     632-   667     76264                                                                        1.0000 0.9443
--     668-   704     68831                                                                        1.0000 0.9449
--     705-   742     63890                                                                        1.0000 0.9454
--     743-   781     58910                                                                        1.0000 0.9459
--     782-   821     55397                                                                        1.0000 0.9464
--
--    15730053 (max occurrences)
-- 55397846165 (total mers, non-unique)
--  5087535457 (distinct mers, non-unique)
-- 37295971775 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.1200    (use overlaps at or below this fraction error)
--        1    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        1    (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--  
--  INPUT READS:
--  -----------
--  14672377 reads  92999998465 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  OUTPUT READS:
--  ------------
--  8658829 reads  59789743255 bases (trimmed reads output)
--  459530 reads   1724060622 bases (reads with no change, kept as is)
--  4935192 reads  19161413842 bases (reads with no overlaps, deleted)
--  618826 reads   1839318641 bases (reads with short trimmed length, deleted)
--  
--  TRIMMING DETAILS:
--  ----------------
--  7735778 reads   5583870621 bases (bases trimmed from the 5' end of a read)
--  8135760 reads   4901591484 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.1200    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--  9118359 reads  71999265982 bases (reads processed)
--  5554018 reads  21000732483 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--  
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--      54 reads       408293 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--  9118305 reads  71998857689 bases (processed for subreads)
--  
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--   13711 reads        13816 signals (number of subread signal)
--  
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--   13816 reads      3934620 bases (size of subread signal)
--  
--  TRIMMING:
--  --------
--    6709 reads     26127706 bases (trimmed from the 5' end of the read)
--    7004 reads     25449523 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In gatekeeper store './MammalianHybrid.gkpStore':
--   Found 9118253 reads.
--   Found 61462138505 bases (25.6 times coverage).
--
--   Read length histogram (one '*' equals 26078.58 reads):
--        0    999      0 
--     1000   1999 1825501 **********************************************************************
--     2000   2999 1031110 ***************************************
--     3000   3999 845287 ********************************
--     4000   4999 750765 ****************************
--     5000   5999 652836 *************************
--     6000   6999 571086 *********************
--     7000   7999 501520 *******************
--     8000   8999 438592 ****************
--     9000   9999 383856 **************
--    10000  10999 334543 ************
--    11000  11999 292978 ***********
--    12000  12999 260678 *********
--    13000  13999 240018 *********
--    14000  14999 212488 ********
--    15000  15999 171268 ******
--    16000  16999 132424 *****
--    17000  17999 102116 ***
--    18000  18999  79973 ***
--    19000  19999  62603 **
--    20000  20999  49254 *
--    21000  21999  38672 *
--    22000  22999  29962 *
--    23000  23999  23484 
--    24000  24999  18509 
--    25000  25999  14537 
--    26000  26999  11489 
--    27000  27999   8979 
--    28000  28999   6910 
--    29000  29999   5652 
--    30000  30999   4368 
--    31000  31999   3488 
--    32000  32999   2767 
--    33000  33999   2176 
--    34000  34999   1689 
--    35000  35999   1362 
--    36000  36999   1024 
--    37000  37999    839 
--    38000  38999    662 
--    39000  39999    515 
--    40000  40999    407 
--    41000  41999    341 
--    42000  42999    300 
--    43000  43999    239 
--    44000  44999    200 
--    45000  45999    133 
--    46000  46999    117 
--    47000  47999     90 
--    48000  48999     77 
--    49000  49999     84 
--    50000  50999     70 
--    51000  51999     39 
--    52000  52999     39 
--    53000  53999     26 
--    54000  54999     26 
--    55000  55999     12 
--    56000  56999     11 
--    57000  57999     10 
--    58000  58999     13 
--    59000  59999     12 
--    60000  60999      2 
--    61000  61999      5 
--    62000  62999      3 
--    63000  63999      1 
--    64000  64999      3 
--    65000  65999      5 
--    66000  66999      1 
--    67000  67999      1 
--    68000  68999      3 
--    69000  69999      1 
--    70000  70999      1 
--    71000  71999      1

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1 11172620458 *******************************************************************--> 0.7329 0.1823
--       2-     2 1161431148 ********************************************************************** 0.8091 0.2203
--       3-     4 611482732 ************************************                                   0.8356 0.2400
--       5-     7 331129379 *******************                                                    0.8581 0.2646
--       8-    11 348290804 ********************                                                   0.8766 0.2965
--      12-    16 485649965 *****************************                                          0.8998 0.3575
--      17-    22 555861335 *********************************                                      0.9322 0.4784
--      23-    29 389551197 ***********************                                                0.9671 0.6551
--      30-    37 134377400 ********                                                               0.9896 0.8037
--      38-    46  25832043 *                                                                      0.9968 0.8635
--      47-    56   8376982                                                                        0.9982 0.8781
--      57-    67   4537876                                                                        0.9987 0.8847
--      68-    79   2909032                                                                        0.9990 0.8890
--      80-    92   2075794                                                                        0.9992 0.8924
--      93-   106   1579598                                                                        0.9993 0.8953
--     107-   121   1242306                                                                        0.9994 0.8978
--     122-   137    982702                                                                        0.9995 0.9001
--     138-   154    793829                                                                        0.9996 0.9021
--     155-   172    651477                                                                        0.9996 0.9040
--     173-   191    533060                                                                        0.9997 0.9057
--     192-   211    450823                                                                        0.9997 0.9073
--     212-   232    392695                                                                        0.9997 0.9088
--     233-   254    338687                                                                        0.9998 0.9102
--     255-   277    294275                                                                        0.9998 0.9115
--     278-   301    267350                                                                        0.9998 0.9128
--     302-   326    249373                                                                        0.9998 0.9140
--     327-   352    241247                                                                        0.9998 0.9153
--     353-   379    221961                                                                        0.9999 0.9167
--     380-   407    195295                                                                        0.9999 0.9180
--     408-   436    167103                                                                        0.9999 0.9192
--     437-   466    140931                                                                        0.9999 0.9204
--     467-   497    121076                                                                        0.9999 0.9214
--     498-   529    102669                                                                        0.9999 0.9224
--     530-   562     88612                                                                        0.9999 0.9232
--     563-   596     78609                                                                        0.9999 0.9240
--     597-   631     69406                                                                        0.9999 0.9247
--     632-   667     63064                                                                        0.9999 0.9254
--     668-   704     57632                                                                        0.9999 0.9261
--     705-   742     53189                                                                        0.9999 0.9267
--     743-   781     49814                                                                        0.9999 0.9274
--     782-   821     45924                                                                        0.9999 0.9280
--
--     8346914 (max occurrences)
-- 50098034734 (total mers, non-unique)
--  4071788061 (distinct mers, non-unique)
-- 11172620458 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing        744    0.01     6104.32 +- 4854.60        388.99 +- 476.76     (bad trimming)
--   middle-hump           101    0.00     3675.01 +- 2490.41        234.88 +- 449.37     (bad trimming)
--   no-5-prime           1376    0.02     7220.38 +- 5605.10        261.33 +- 457.43     (bad trimming)
--   no-3-prime           1812    0.02     7850.46 +- 5756.87        225.72 +- 423.40     (bad trimming)
--   
--   low-coverage        60763    0.67     2063.02 +- 1321.15          5.73 +- 1.78       (easy to assemble, potential for lower quality consensus)
--   unique            6978821   76.54     6620.95 +- 5050.92         25.01 +- 6.16       (easy to assemble, perfect, yay)
--   repeat-cont        743748    8.16     3023.64 +- 3170.97       2627.92 +- 3359.47    (potential for consensus errors, no impact on assembly)
--   repeat-dove          1243    0.01    22114.86 +- 9498.41        482.93 +- 1094.61    (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat        769221    8.44    10966.59 +- 6532.41       2437.63 +- 2907.39    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont   480812    5.27     6745.51 +- 4395.12                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove    62895    0.69    16390.36 +- 6010.52                             (will end contigs, potential to misassemble)
--   uniq-anchor          9599    0.11    10580.37 +- 5931.30       3728.62 +- 4269.05    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      2766 sequences, total length 2403212721 bp (including 272 repeats of total length 6709814 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  1884490 sequences, total length 7526628627 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     9994699            20   247895044
--     20     7498391            48   485356104
--     30     5694133            84   721398530
--     40     4262615           133   960671602
--     50     3272049           199  1200394793
--     60     2425774           285  1441105224
--     70     1723458           403  1680264870
--     80     1159844           573  1921127241
--     90      619550           853  2160289294
--    100       13629          2283  2400002451
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      2766 sequences, total length 2400638939 bp (including 272 repeats of total length 6672776 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  1884490 sequences, total length 7526613120 bp.
--
-- Contig sizes based on genome size --
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     9980068            20   247471354
--     20     7491969            48   484637275
--     30     5689728            84   720428199
--     40     4183826           134   963670056
--     50     3267225           200  1202267975
--     60     2412360           286  1441926762
--     70     1721309           404  1680217489
--     80     1157649           574  1920294653
--     90      616254           857  2160539298
--    100        5330          2560  2400004284
--

BUSCO stats.

# BUSCO version is: 3.0.2 
# The lineage dataset is: mammalia_odb9 (Creation date: 2016-02-13, number of species: 50, number of BUSCOs: 4104)

# BUSCO was run in mode: genome

        C:89.1%[S:88.6%,D:0.5%],F:6.6%,M:4.3%,n:4104

        3656    Complete BUSCOs (C)
        3637    Complete and single-copy BUSCOs (S)
        19      Complete and duplicated BUSCOs (D)
        269     Fragmented BUSCOs (F)
        179     Missing BUSCOs (M)
        4104    Total BUSCO groups searched

Thank you so much for all the help Sergey.

With Regards, Harsh

skoren commented 6 years ago

Yes, the report looks good. There is a k-mer peak at around 20x coverage after correction/trimming which is consistent with your genome size. It looks like most overlaps are also unique which is good for the assembly. The busco report also looks reasonable and might improve if you run Arrow with the PacBio data on the assembly.