marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

HiCanu larger size assembly #1817

Closed wyim-pgl closed 3 years ago

wyim-pgl commented 3 years ago

Hi Sergey, I tried to assemble an allotetraploid genome which has 900M as haploid genome size. I ran HiCanu with the below options and the output data.contigs.fasta has 2.4G size. Although I expected to have around 1.8G for genome, it generated much larger. Do you have any suggestions? Thanks. Won

sbatch --mem-per-cpu=16g --cpus-per-task=1  --time=14-00:00:00 -D `pwd` -J "data" -o output.out --wrap="canu/Linux-amd64/bin/canu -p data -d data genomeSize=900M maxInputCoverage=100  -pacbio-hifi m64047_200708_081720.ccs.fastq.gz corMemory=186 corThreads=64 batMemory=186  ovbMemory=24 ovbThreads=12 corOutCoverage=120  ovsMemory=32-186 maxMemory=220 ovsThreads=20 oeaMemory=32  merylThreads=4 merylMemory=220  corMhapFilterThreshold=0.0000000002 mhapMemory=60g mhapBlockSize=500 ovlMerDistinct=0.975 gridOptions='--time=12-00:00:00 -p cpu-s1-pgl-0 -A cpu-s1-pgl-0' "
sbatch --mem-per-cpu=16g --cpus-per-task=1  --time=14-00:00:00 -D `pwd` -J "data" -o output.out --wrap="canu/Linux-amd64/bin/canu -p data -d data genomeSize=900M maxInputCoverage=100  -pacbio-hifi m64047_200708_081720.ccs.fastq.gz corMemory=186 corThreads=64 batMemory=186  ovbMemory=24 ovbThreads=12 corOutCoverage=120  ovsMemory=32-186 maxMemory=220 ovsThreads=20 oeaMemory=32  merylThreads=4 merylMemory=220  corMhapFilterThreshold=0.0000000002 mhapMemory=60g mhapBlockSize=500 ovlMerDistinct=0.975 gridOptions='--time=12-00:00:00 -p cpu-s1-pgl-0 -A cpu-s1-pgl-0' "

[TRIMMING/READS]
--
-- In sequence store './data.seqStore':
--   Found 2248974 reads.
--   Found 27348782473 bases (30.38 times coverage).
--
--    G=27348782473                      sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        15752    160204   2734882862  ||       3006-3739          259|-
--    00020        14435    342124   5469761347  ||       3740-4473          263|-
--    00030        13514    538181   8204636976  ||       4474-5207          448|-
--    00040        12771    746497  10939524901  ||       5208-5941          780|-
--    00050        12129    966330  13674391641  ||       5942-6675         1376|-
--    00060        11551   1197443  16409273498  ||       6676-7409         2255|-
--    00070        11012   1439970  19144153658  ||       7410-8143         3377|-
--    00080        10495   1694366  21879027176  ||       8144-8877         9886|--
--    00090         9981   1961518  24613907502  ||       8878-9611       106893|-------------------
--    00100         3006   2248973  27348782473  ||       9612-10345      351989|--------------------------------------------------------------
--    001.000x             2248974  27348782473  ||      10346-11079      363434|---------------------------------------------------------------
--                                               ||      11080-11813      319291|--------------------------------------------------------
--                                               ||      11814-12547      269986|-----------------------------------------------
--                                               ||      12548-13281      220331|---------------------------------------
--                                               ||      13282-14015      174938|-------------------------------
--                                               ||      14016-14749      133997|------------------------
--                                               ||      14750-15483       99938|------------------
--                                               ||      15484-16217       71597|-------------
--                                               ||      16218-16951       48407|---------
--                                               ||      16952-17685       31106|------
--                                               ||      17686-18419       18709|----
--                                               ||      18420-19153       10322|--
--                                               ||      19154-19887        5150|-
--                                               ||      19888-20621        2335|-
--                                               ||      20622-21355         935|-
--                                               ||      21356-22089         413|-
--                                               ||      22090-22823         183|-
--                                               ||      22824-23557         126|-
--                                               ||      23558-24291          61|-
--                                               ||      24292-25025          40|-
--                                               ||      25026-25759          37|-
--                                               ||      25760-26493          28|-
--                                               ||      26494-27227          21|-
--                                               ||      27228-27961          16|-
--                                               ||      27962-28695          12|-
--                                               ||      28696-29429           7|-
--                                               ||      29430-30163           7|-
--                                               ||      30164-30897           4|-
--                                               ||      30898-31631           5|-
--                                               ||      31632-32365           2|-
--                                               ||      32366-33099           1|-
--                                               ||      33100-33833           3|-
--                                               ||      33834-34567           2|-
--                                               ||      34568-35301           0|
--                                               ||      35302-36035           1|-
--                                               ||      36036-36769           1|-
--                                               ||      36770-37503           0|
--                                               ||      37504-38237           0|
--                                               ||      38238-38971           1|-
--                                               ||      38972-39705           1|-
--

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2   6627411 ****                                                                   0.0115 0.0007
--       3-     4  19849202 *************                                                          0.0242 0.0019
--       5-     7  73354396 ************************************************                       0.0790 0.0097
--       8-    11 105504295 ********************************************************************** 0.2246 0.0413
--      12-    16  79565647 ****************************************************                   0.3876 0.0931
--      17-    22  76648180 **************************************************                     0.5171 0.1526
--      23-    29  71773495 ***********************************************                        0.6450 0.2328
--      30-    37  59985107 ***************************************                                0.7653 0.3324
--      38-    46  32093039 *********************                                                  0.8627 0.4348
--      47-    56  14799307 *********                                                              0.9126 0.4997
--      57-    67   9649761 ******                                                                 0.9366 0.5381
--      68-    79   6550057 ****                                                                   0.9526 0.5689
--      80-    92   4433248 **                                                                     0.9635 0.5937
--      93-   106   3181902 **                                                                     0.9709 0.6135
--     107-   121   2379865 *                                                                      0.9763 0.6300
--     122-   137   1807781 *                                                                      0.9803 0.6442
--     138-   154   1399394                                                                        0.9834 0.6565
--     155-   172   1112607                                                                        0.9857 0.6672
--     173-   191    897579                                                                        0.9876 0.6768
--     192-   211    737050                                                                        0.9892 0.6854
--     212-   232    608902                                                                        0.9904 0.6933
--     233-   254    512184                                                                        0.9915 0.7004
--     255-   277    431508                                                                        0.9923 0.7070
--     278-   301    368755                                                                        0.9931 0.7131
--     302-   326    316512                                                                        0.9937 0.7187
--     327-   352    274527                                                                        0.9943 0.7240
--     353-   379    236545                                                                        0.9947 0.7289
--     380-   407    206635                                                                        0.9951 0.7335
--     408-   436    180510                                                                        0.9955 0.7378
--     437-   466    157111                                                                        0.9958 0.7419
--     467-   497    140017                                                                        0.9961 0.7456
--     498-   529    124349                                                                        0.9963 0.7492
--     530-   562    111066                                                                        0.9965 0.7526
--     563-   596     99749                                                                        0.9967 0.7558
--     597-   631     91388                                                                        0.9969 0.7589
--     632-   667     82364                                                                        0.9970 0.7619
--     668-   704     74307                                                                        0.9972 0.7647
--     705-   742     69081                                                                        0.9973 0.7674
--     743-   781     66064                                                                        0.9974 0.7701
--     782-   821     61844                                                                        0.9976 0.7728
--
--           0 (max occurrences)
-- 18771184576 (total mers, non-unique)
--   577918346 (distinct mers, non-unique)
--           0 (unique mers)

[TRIMMING/TRIMMING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0250    (use overlaps at or below this fraction error)
--      500    (break region if overlap is less than this long, for 'largest covered' algorithm)
--        2    (break region if overlap coverage is less than this many reads, for 'largest covered' algorithm)
--
--  INPUT READS:
--  -----------
--  2248974 reads  18910706648 bases (reads processed)
--       0 reads            0 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--
--  OUTPUT READS:
--  ------------
--  190181 reads   1500767530 bases (trimmed reads output)
--  1940347 reads  16305415526 bases (reads with no change, kept as is)
--  111411 reads    916964401 bases (reads with no overlaps, deleted)
--    7035 reads     59461031 bases (reads with short trimmed length, deleted)
--
--  TRIMMING DETAILS:
--  ----------------
--  112887 reads     62307824 bases (bases trimmed from the 5' end of a read)
--   93082 reads     65790336 bases (bases trimmed from the 3' end of a read)

[TRIMMING/SPLITTING]
--  PARAMETERS:
--  ----------
--     1000    (reads trimmed below this many bases are deleted)
--   0.0250    (use overlaps at or below this fraction error)
--  INPUT READS:
--  -----------
--  2130528 reads  17934281216 bases (reads processed)
--  118446 reads    976425432 bases (reads not processed, previously deleted)
--       0 reads            0 bases (reads not processed, in a library where trimming isn't allowed)
--
--  PROCESSED:
--  --------
--       0 reads            0 bases (no overlaps)
--      60 reads       447244 bases (no coverage after adjusting for trimming done already)
--       0 reads            0 bases (processed for chimera)
--       0 reads            0 bases (processed for spur)
--  2130468 reads  17933833972 bases (processed for subreads)
--
--  READS WITH SIGNALS:
--  ------------------
--       0 reads            0 signals (number of 5' spur signal)
--       0 reads            0 signals (number of 3' spur signal)
--       0 reads            0 signals (number of chimera signal)
--    1070 reads         1112 signals (number of subread signal)
--
--  SIGNALS:
--  -------
--       0 reads            0 bases (size of 5' spur signal)
--       0 reads            0 bases (size of 3' spur signal)
--       0 reads            0 bases (size of chimera signal)
--    1112 reads       327416 bases (size of subread signal)
--
--  TRIMMING:
--  --------
--     502 reads      1329649 bases (trimmed from the 5' end of the read)
--     580 reads      1521795 bases (trimmed from the 3' end of the read)

[UNITIGGING/READS]
--
-- In sequence store './data.seqStore':
--   Found 2130528 reads.
--   Found 25751682902 bases (28.61 times coverage).
--
--    G=25751682902                      sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        15722    151293   2575179539  ||       1371-2090          661|-
--    00020        14409    322902   5150341045  ||       2091-2810          836|-
--    00030        13489    507847   7725506201  ||       2811-3530         1121|-
--    00040        12745    704375  10300683133  ||       3531-4250         1383|-
--    00050        12103    911798  12875849016  ||       4251-4970         1942|-
--    00060        11523   1129906  15451011591  ||       4971-5690         2794|-
--    00070        10983   1358840  18026179668  ||       5691-6410         3741|-
--    00080        10464   1599052  20601348078  ||       6411-7130         4819|-
--    00090         9945   1851417  23176519047  ||       7131-7850         6022|--
--    00100         1371   2130527  25751682902  ||       7851-8570         8872|--
--    001.000x             2130528  25751682902  ||       8571-9290        39082|--------
--                                               ||       9291-10010      239245|--------------------------------------------
--                                               ||      10011-10730      346467|---------------------------------------------------------------
--                                               ||      10731-11450      314038|----------------------------------------------------------
--                                               ||      11451-12170      271329|--------------------------------------------------
--                                               ||      12171-12890      225528|------------------------------------------
--                                               ||      12891-13610      182759|----------------------------------
--                                               ||      13611-14330      143524|---------------------------
--                                               ||      14331-15050      109060|--------------------
--                                               ||      15051-15770       80576|---------------
--                                               ||      15771-16490       56650|-----------
--                                               ||      16491-17210       38036|-------
--                                               ||      17211-17930       23875|-----
--                                               ||      17931-18650       14189|---
--                                               ||      18651-19370        7592|--
--                                               ||      19371-20090        3784|-
--                                               ||      20091-20810        1576|-
--                                               ||      20811-21530         601|-
--                                               ||      21531-22250         235|-
--                                               ||      22251-22970          86|-
--                                               ||      22971-23690          54|-
--                                               ||      23691-24410          11|-
--                                               ||      24411-25130           8|-
--                                               ||      25131-25850           8|-
--                                               ||      25851-26570           6|-
--                                               ||      26571-27290           0|
--                                               ||      27291-28010           3|-
--                                               ||      28011-28730           4|-
--                                               ||      28731-29450           2|-
--                                               ||      29451-30170           1|-
--                                               ||      30171-30890           0|
--                                               ||      30891-31610           3|-
--                                               ||      31611-32330           1|-
--                                               ||      32331-33050           1|-
--                                               ||      33051-33770           0|
--                                               ||      33771-34490           2|-
--                                               ||      34491-35210           0|
--                                               ||      35211-35930           0|
--                                               ||      35931-36650           0|
--                                               ||      36651-37370           1|-
--

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2   5117983 ***                                                                    0.0089 0.0006
--       3-     4  18880104 ************                                                           0.0208 0.0017
--       5-     7  73096781 ************************************************                       0.0749 0.0098
--       8-    11 105415261 ********************************************************************** 0.2212 0.0434
--      12-    16  79513830 ****************************************************                   0.3851 0.0984
--      17-    22  76670792 **************************************************                     0.5153 0.1615
--      23-    29  71699884 ***********************************************                        0.6440 0.2468
--      30-    37  59774312 ***************************************                                0.7649 0.3525
--      38-    46  31945139 *********************                                                  0.8625 0.4608
--      47-    56  14738896 *********                                                              0.9125 0.5294
--      57-    67   9608955 ******                                                                 0.9365 0.5701
--      68-    79   6522935 ****                                                                   0.9525 0.6027
--      80-    92   4413203 **                                                                     0.9634 0.6289
--      93-   106   3169731 **                                                                     0.9709 0.6498
--     107-   121   2370893 *                                                                      0.9762 0.6673
--     122-   137   1800215 *                                                                      0.9803 0.6823
--     138-   154   1393567                                                                        0.9833 0.6953
--     155-   172   1110348                                                                        0.9857 0.7067
--     173-   191    895774                                                                        0.9876 0.7168
--     192-   211    736505                                                                        0.9892 0.7260
--     212-   232    613645                                                                        0.9904 0.7343
--     233-   254    522897                                                                        0.9915 0.7419
--     255-   277    453466                                                                        0.9924 0.7491
--     278-   301    404254                                                                        0.9932 0.7559
--     302-   326    354410                                                                        0.9939 0.7625
--     327-   352    307524                                                                        0.9945 0.7687
--     353-   379    262006                                                                        0.9950 0.7746
--     380-   407    224740                                                                        0.9955 0.7800
--     408-   436    194441                                                                        0.9959 0.7850
--     437-   466    168354                                                                        0.9962 0.7896
--     467-   497    149477                                                                        0.9965 0.7939
--     498-   529    132225                                                                        0.9968 0.7980
--     530-   562    118171                                                                        0.9970 0.8018
--     563-   596    106086                                                                        0.9972 0.8054
--     597-   631     94798                                                                        0.9974 0.8089
--     632-   667     85537                                                                        0.9975 0.8122
--     668-   704     77887                                                                        0.9977 0.8153
--     705-   742     71390                                                                        0.9978 0.8183
--     743-   781     66507                                                                        0.9979 0.8213
--     782-   821     61024                                                                        0.9981 0.8241
--
--           0 (max occurrences)
-- 17670444929 (total mers, non-unique)
--   574399779 (distinct mers, non-unique)
--           0 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing       2457    0.12     8406.49 +- 1745.39       1675.03 +- 1566.46    (bad trimming)
--   middle-hump           272    0.01     6805.75 +- 2338.31        944.29 +- 1447.41    (bad trimming)
--   no-5-prime           8658    0.41     8127.77 +- 1872.27        901.38 +- 1686.80    (bad trimming)
--   no-3-prime           8881    0.42     8069.41 +- 1887.05        900.11 +- 1676.35    (bad trimming)
--
--   low-coverage       322757   15.15     7944.29 +- 1639.93          5.69 +- 1.81       (easy to assemble, potential for lower quality consensus)
--   unique             932965   43.79     8345.40 +- 1472.58         20.66 +- 6.37       (easy to assemble, perfect, yay)
--   repeat-cont         11812    0.55     7997.99 +- 1370.55         97.02 +- 38.81      (potential for consensus errors, no impact on assembly)
--   repeat-dove           587    0.03    11103.60 +- 1546.84         92.00 +- 37.50      (hard to assemble, likely won't assemble correctly or even at all)
--
--   span-repeat        332981   15.63     8645.91 +- 1581.86       2897.68 +- 2484.59    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont   274278   12.87     7737.51 +- 1048.96                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove   232631   10.92     9341.55 +- 1521.59                             (will end contigs, potential to misassemble)
--   uniq-anchor           493    0.02     8740.40 +- 1850.36       2236.29 +- 2130.21    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      20295 sequences, total length 1753080058 bp (including 55 repeats of total length 661038 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  65462 sequences, total length 525568019 bp.
--
-- Contig sizes based on genome size 900mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     7910229            10    94011119
--     20     4037670            27   180902077
--     30     2394681            58   271967267
--     40     1409371           108   360656458
--     50      796070           195   450274615
--     60      541495           336   540188847
--     70      415483           526   630067459
--     80      347327           764   720049235
--     90      290271          1048   810284788
--    100      247744          1384   900219331
--    110      210490          1777   990151347
--    120      178652          2241  1080028916
--    130      148414          2794  1170049926
--    140      120570          3467  1260052021
--    150       92040          4316  1350040731
--    160       64760          5475  1440043220
--    170       34853          7363  1530009578
--    180       19003         11001  1620001007
--    190       13773         16587  1710007835
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      20295 sequences, total length 2532927451 bp (including 55 repeats of total length 954318 bp).
--   bubbles:      0 sequences, total length 0 bp.
--   unassembled:  65462 sequences, total length 759009390 bp.
--
-- Contig sizes based on genome size 900mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10    12257700             7    99440255
--     20     8217465            16   188023946
--     30     5510788            29   270858000
--     40     3710000            50   363015668
--     50     2707535            78   450131390
--     60     1867475           119   540524466
--     70     1249951           180   631217960
--     80      900743           266   720496254
--     90      730683           377   810363635
--    100      608327           511   900062709
--    110      532934           670   990103847
--    120      474020           849  1080132710
--    130      419379          1051  1170224720
--    140      374927          1278  1260185771
--    150      336193          1532  1350301047
--    160      298786          1815  1440175174
--    170      267505          2133  1530172634
--    180      237116          2490  1620133353
--    190      208457          2895  1710090442
--    200      180077          3360  1800055377
--    210      153107          3902  1890121533
--    220      124336          4555  1980109813
--    230       96833          5372  2070060615
--    240       66296          6489  2160054656
--    250       40979          8224  2250020953
--    260       27482         10959  2340013855
--    270       22084         14641  2430012451
--    280       15349         19343  2520009643
--
skoren commented 3 years ago

What version of Canu are you running? It's a little strange there are absolutely no bubbles marked here unless your genome is extremely heterozygous or you have an old Canu version. It's possible some of the extra contigs are false duplications due to under-corrected reads in the assembly. That said, your k-mer distribution shows a peak in the 8-11x range which would be consistent with a 2.4g genome size like the assembly generated. How confident are you in the 900mb genome size?

I'd suggest using the latest 2.1 release and checking genome size perdition's from GenomeScope or similar. You should also run purge_dups after the assembly as that will remove the alternate haplotype along with false duplications, if they exist.

wyim-pgl commented 3 years ago

Thanks, Sergey. I compiled it from github. The version shows below.

(base) wyim @ login-0 13:25:20 ~/scratch/data/
  canu/Linux-amd64/bin/canu --version

Canu branch hicanu_rc +325 changes (r9818 86bb2e221546c76437887d3a0ff5ab9546f85317)

The 900Mb genome size came from its diploid progenitor, I assumed it should be the double size. Also I checked it with flow cytometry, it showed ~2Gbp.

I will try it with the latest 2.1 release and keep you posted. Regards, Won

wyim-pgl commented 3 years ago

Now I read this... To install from source code (DO NOT download the Source code files provided by GitHub as these will not compile, use the canu-2.1.tar.gz instead):

As usual, I compiled from Github source.

I will try it again with Canu 2.1. Thanks.

skoren commented 3 years ago

Your version is relatively old at this point, I'd expect 2.1 to be improved. You can either download the source tar.gz or the pre-compiled binaries for your system.

wyim-pgl commented 3 years ago

I am rerunning now and keep you posted. Thanks.

wyim-pgl commented 3 years ago

Hi Sergey, I ran with the current release, now it makes bubbles. The final size is still bigger than I expected. Do you have any recommendations? Thanks. Won

cat data.report

[UNITIGGING/READS]
--
-- In sequence store './data.seqStore':
--   Found 2248974 reads.
--   Found 27348782473 bases (30.38 times coverage).
--
--    G=27348782473                      sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        15752    160204   2734882862  ||       3006-3739          259|-
--    00020        14435    342124   5469761347  ||       3740-4473          263|-
--    00030        13514    538181   8204636976  ||       4474-5207          448|-
--    00040        12771    746497  10939524901  ||       5208-5941          780|-
--    00050        12129    966330  13674391641  ||       5942-6675         1376|-
--    00060        11551   1197443  16409273498  ||       6676-7409         2255|-
--    00070        11012   1439970  19144153658  ||       7410-8143         3377|-
--    00080        10495   1694366  21879027176  ||       8144-8877         9886|--
--    00090         9981   1961518  24613907502  ||       8878-9611       106893|-------------------
--    00100         3006   2248973  27348782473  ||       9612-10345      351989|--------------------------------------------------------------
--    001.000x             2248974  27348782473  ||      10346-11079      363434|---------------------------------------------------------------
--                                               ||      11080-11813      319291|--------------------------------------------------------
--                                               ||      11814-12547      269986|-----------------------------------------------
--                                               ||      12548-13281      220331|---------------------------------------
--                                               ||      13282-14015      174938|-------------------------------
--                                               ||      14016-14749      133997|------------------------
--                                               ||      14750-15483       99938|------------------
--                                               ||      15484-16217       71597|-------------
--                                               ||      16218-16951       48407|---------
--                                               ||      16952-17685       31106|------
--                                               ||      17686-18419       18709|----
--                                               ||      18420-19153       10322|--
--                                               ||      19154-19887        5150|-
--                                               ||      19888-20621        2335|-
--                                               ||      20622-21355         935|-
--                                               ||      21356-22089         413|-
--                                               ||      22090-22823         183|-
--                                               ||      22824-23557         126|-
--                                               ||      23558-24291          61|-
--                                               ||      24292-25025          40|-
--                                               ||      25026-25759          37|-
--                                               ||      25760-26493          28|-
--                                               ||      26494-27227          21|-
--                                               ||      27228-27961          16|-
--                                               ||      27962-28695          12|-
--                                               ||      28696-29429           7|-
--                                               ||      29430-30163           7|-
--                                               ||      30164-30897           4|-
--                                               ||      30898-31631           5|-
--                                               ||      31632-32365           2|-
--                                               ||      32366-33099           1|-
--                                               ||      33100-33833           3|-
--                                               ||      33834-34567           2|-
--                                               ||      34568-35301           0|
--                                               ||      35302-36035           1|-
--                                               ||      36036-36769           1|-
--                                               ||      36770-37503           0|
--                                               ||      37504-38237           0|
--                                               ||      38238-38971           1|-
--                                               ||      38972-39705           1|-
--

[UNITIGGING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2   6627411 ****                                                                   0.0115 0.0007
--       3-     4  19849202 *************                                                          0.0242 0.0019
--       5-     7  73354396 ************************************************                       0.0790 0.0097
--       8-    11 105504295 ********************************************************************** 0.2246 0.0413
--      12-    16  79565647 ****************************************************                   0.3876 0.0931
--      17-    22  76648180 **************************************************                     0.5171 0.1526
--      23-    29  71773495 ***********************************************                        0.6450 0.2328
--      30-    37  59985107 ***************************************                                0.7653 0.3324
--      38-    46  32093039 *********************                                                  0.8627 0.4348
--      47-    56  14799307 *********                                                              0.9126 0.4997
--      57-    67   9649761 ******                                                                 0.9366 0.5381
--      68-    79   6550057 ****                                                                   0.9526 0.5689
--      80-    92   4433248 **                                                                     0.9635 0.5937
--      93-   106   3181902 **                                                                     0.9709 0.6135
--     107-   121   2379865 *                                                                      0.9763 0.6300
--     122-   137   1807781 *                                                                      0.9803 0.6442
--     138-   154   1399394                                                                        0.9834 0.6565
--     155-   172   1112607                                                                        0.9857 0.6672
--     173-   191    897579                                                                        0.9876 0.6768
--     192-   211    737050                                                                        0.9892 0.6854
--     212-   232    608902                                                                        0.9904 0.6933
--     233-   254    512184                                                                        0.9915 0.7004
--     255-   277    431508                                                                        0.9923 0.7070
--     278-   301    368755                                                                        0.9931 0.7131
--     302-   326    316512                                                                        0.9937 0.7187
--     327-   352    274527                                                                        0.9943 0.7240
--     353-   379    236545                                                                        0.9947 0.7289
--     380-   407    206635                                                                        0.9951 0.7335
--     408-   436    180510                                                                        0.9955 0.7378
--     437-   466    157111                                                                        0.9958 0.7419
--     467-   497    140017                                                                        0.9961 0.7456
--     498-   529    124349                                                                        0.9963 0.7492
--     530-   562    111066                                                                        0.9965 0.7526
--     563-   596     99749                                                                        0.9967 0.7558
--     597-   631     91388                                                                        0.9969 0.7589
--     632-   667     82364                                                                        0.9970 0.7619
--     668-   704     74307                                                                        0.9972 0.7647
--     705-   742     69081                                                                        0.9973 0.7674
--     743-   781     66064                                                                        0.9974 0.7701
--     782-   821     61844                                                                        0.9976 0.7728
--
--           0 (max occurrences)
-- 18771184576 (total mers, non-unique)
--   577918346 (distinct mers, non-unique)
--           0 (unique mers)

[UNITIGGING/OVERLAPS]
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing       6383    0.28     9639.17 +- 2647.72       1389.84 +- 1334.91    (bad trimming)
--   middle-hump           622    0.03    10079.20 +- 1614.30       2348.29 +- 1385.90    (bad trimming)
--   no-5-prime          16928    0.75     8688.17 +- 1635.00       1981.53 +- 1950.74    (bad trimming)
--   no-3-prime          15445    0.69     8734.54 +- 1658.78       2268.40 +- 2063.64    (bad trimming)
--
--   low-coverage       322204   14.33     8200.21 +- 1406.77          5.65 +- 1.85       (easy to assemble, potential for lower quality consensus)
--   unique             929949   41.35     8350.63 +- 1469.41         20.77 +- 6.56       (easy to assemble, perfect, yay)
--   repeat-cont          9305    0.41     8026.80 +- 1338.05        100.10 +- 37.60      (potential for consensus errors, no impact on assembly)
--   repeat-dove           401    0.02    11204.49 +- 1502.88         96.80 +- 38.03      (hard to assemble, likely won't assemble correctly or even at all)
--
--   span-repeat        328538   14.61     8659.70 +- 1582.17       2902.59 +- 2485.45    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont   268575   11.94     7751.03 +- 1035.53                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove   234859   10.44     9342.10 +- 1522.81                             (will end contigs, potential to misassemble)
--   uniq-anchor           347    0.02     8862.42 +- 1870.23       2134.26 +- 2285.80    (repeat read, with unique section, probable bad read)

[UNITIGGING/ADJUSTMENT]
-- No report available.

[UNITIGGING/ERROR RATES]
--
--  ERROR RATES
--  -----------
--                                                   --------threshold------
--  2974421                      fraction error      fraction        percent
--  samples                              (1e-5)         error          error
--                   --------------------------      --------       --------
--  command line (-eg)                           ->     30.00        0.0300%
--  command line (-eM)                           ->   1000.00        1.0000%
--  mean + std.dev       0.78 +-   4 *     3.94  ->     16.54        0.0165%
--  median + mad         0.00 +-   4 *     0.00  ->      0.00        0.0000%
--  90th percentile                              ->      1.00        0.0010%  (enabled)
--
--  BEST EDGE FILTERING
--  -------------------
--  At graph threshold 0.0300%, reads:
--    available to have edges:      1102333
--    with at least one edge:        944316
--
--  At max threshold 1.0000%, reads:  (not computed)
--    available to have edges:            0
--    with at least one edge:             0
--
--  At tight threshold 0.0010%, reads with:
--    both edges below threshold:    844539
--    one  edge  above threshold:     78769
--    both edges above threshold:     21008
--    at least one edge:             944316
--
--  At loose threshold 0.0165%, reads with:
--    both edges below threshold:    889279
--    one  edge  above threshold:     47820
--    both edges above threshold:      7217
--    at least one edge:             944316
--
--
--  INITIAL EDGES
--  -------- ----------------------------------------
--   1079489 reads are contained
--    272183 reads have no best edges (singleton)
--     19917 reads have only one best edge (spur)
--              18371 are mutual best
--    877385 reads have two best edges
--              30163 have one mutual best edge
--             844272 have two mutual best edges
--
--
--  FINAL EDGES
--  -------- ----------------------------------------
--   1079489 reads are contained
--    276081 reads have no best edges (singleton)
--     19741 reads have only one best edge (spur)
--              18653 are mutual best
--    873663 reads have two best edges
--              27683 have one mutual best edge
--             843329 have two mutual best edges
--
--
--  EDGE FILTERING
--  -------- ------------------------------------------
--         0 reads are ignored
--    103380 reads have a gap in overlap coverage
--      1428 reads have lopsided best edges

[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
--   contigs:      7979 sequences, total length 1476787006 bp (including 1592 repeats of total length 20950723 bp).
--   bubbles:      10985 sequences, total length 259471232 bp.
--   unassembled:  292879 sequences, total length 2476730635 bp.
--
-- Contig sizes based on genome size 900mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     5291328            12    93138315
--     20     2773657            36   181856844
--     30     1417871            82   270387738
--     40      913324           163   360285301
--     50      674909           281   450404201
--     60      533504           432   540515149
--     70      446789           616   630436641
--     80      372694           837   720116947
--     90      315763          1099   810284330
--    100      269516          1407   900039718
--    110      229086          1770   990041518
--    120      189625          2202  1080025950
--    130      152439          2730  1170038631
--    140      117214          3402  1260096765
--    150       79994          4320  1350067079
--    160       33063          5984  1440017427
--

[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
--   contigs:      7979 sequences, total length 2134680991 bp (including 1592 repeats of total length 30250112 bp).
--   bubbles:      10985 sequences, total length 374940873 bp.
--   unassembled:  292879 sequences, total length 3578120417 bp.
--
-- Contig sizes based on genome size 900mbp:
--
--            NG (bp)  LG (contigs)    sum (bp)
--         ----------  ------------  ----------
--     10     9290601             7    92668417
--     20     5939898            20   185599728
--     30     3795553            39   272304223
--     40     2278086            69   360815916
--     50     1619836           117   451400255
--     60     1226869           181   541047031
--     70     1013475           263   630851441
--     80      857363           360   720465313
--     90      735938           474   810633631
--    100      659422           603   900514005
--    110      573461           750   990159678
--    120      513061           917  1080482960
--    130      454263          1103  1170448747
--    140      409305          1311  1260021121
--    150      365009          1544  1350100316
--    160      325568          1805  1440133640
--    170      285867          2100  1530146016
--    180      249145          2437  1620136827
--    190      212963          2828  1710079055
--    200      176670          3292  1800170649
--    210      142522          3855  1890003798
--    220      101779          4599  1980007599
--    230       54179          5772  2070007410
--
skoren commented 3 years ago

Given the histogram peak at 8-11x, that still makes me think 2.4g is a reasonable size. I'd suggest checking genome size estimates using genome scope and seeing what that gives. I'd also run purge_dups and see how large the purged assembly ends up (you may have to manually adjust the purge_dups cutoffs).

wyim-pgl commented 3 years ago

Thanks I will keep you posted.