marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Read correction does not produce any output #2333

Closed Ppatel09-wq closed 1 month ago

Ppatel09-wq commented 2 months ago

Hello, I am encountering an issue where Canu fails to generate corrected reads because the overlap store is empty. Below are the details of the issue, steps taken to troubleshoot, and relevant log outputs.

Canu version: 2.2

Command used

/Documents/canu/build/bin/canu -p assembly_prefix -d canu1 genomeSize=200m -nanopore N5_all_ONT_data.fastq -raw -pacbio YT11_resub.subreads.fastq.gz

Steps taken: Based on issue #2182 I tried to change change the corMinCoverage values tested: 0, 2, 3 corErrorRate values tested: 0.5 and since I downloaded the latest version the the minOlapLength was already changed from 500. Could you provide guidance on how to handle this and adjust Canu settings effectively? Thank you for your assistance.

Log Output:

    ./mhap.sh 187 > ./mhap.000187.out 2>&1

-- Finished on Sun Jul 28 04:38:24 2024 (461213 seconds, fashionably late) with 3754.308 GB free disk space
----------------------------------------
-- Found 187 mhap overlap output files.
-- Finished stage 'cor-mhapCheck', reset canuIteration.
----------------------------------------
-- Starting command on Sun Jul 28 04:38:31 2024 with 3754.307 GB free disk space

    cd correction
    /Users/yonastekle/Documents/canu/build/bin/ovStoreConfig \
     -S ../assembly_prefix.seqStore \
     -M 4-8 \
     -L ./1-overlapper/ovljob.files \
     -create ./assembly_prefix.ovlStore.config \
     > ./assembly_prefix.ovlStore.config.txt \
    2> ./assembly_prefix.ovlStore.config.err

-- Finished on Sun Jul 28 04:38:53 2024 (22 seconds) with 3754.301 GB free disk space
----------------------------------------
--
-- Creating overlap store correction/assembly_prefix.ovlStore using:
--      1 bucket
--      1 slice
--        using at most 1 GB memory each
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'ovS' concurrent execution on Sun Jul 28 04:38:53 2024 with 3754.301 GB free disk space (1 processes; 16 concurrently)

    cd correction
    ./assembly_prefix.ovlStore.sh 1 > ./assembly_prefix.ovlStore.000001.out 2>&1

-- Finished on Sun Jul 28 04:38:56 2024 (3 seconds) with 3754.241 GB free disk space
----------------------------------------
-- Checking store.
----------------------------------------
-- Starting command on Sun Jul 28 04:38:56 2024 with 3754.241 GB free disk space

    cd correction
    /Users/yonastekle/Documents/canu/build/bin/ovStoreDump \
     -S ../assembly_prefix.seqStore \
     -O  ./assembly_prefix.ovlStore \
     -counts \
     > ./assembly_prefix.ovlStore/counts.dat 2> ./assembly_prefix.ovlStore/counts.err

-- Finished on Sun Jul 28 04:38:57 2024 (one second) with 3754.217 GB free disk space
----------------------------------------
--
-- Overlap store 'correction/assembly_prefix.ovlStore' successfully constructed.
-- Found 0 overlaps for 0 reads; 2634323 reads have no overlaps.
--
--
-- Purged 46.341 GB in 450 overlap output files.
-- Finished stage 'cor-createOverlapStore', reset canuIteration.
-- Set corMinCoverage=4 based on read coverage of 107.41.
-- Computing correction layouts.
--   Local  filter coverage   80
--   Global filter coverage   40
----------------------------------------
-- Starting command on Sun Jul 28 04:39:07 2024 with 3801.372 GB free disk space

    cd correction
    /Users/yonastekle/Documents/canu/build/bin/generateCorrectionLayouts \
      -S ../assembly_prefix.seqStore \
      -O  ./assembly_prefix.ovlStore \
      -C  ./assembly_prefix.corStore.WORKING \
      -eC 80 \
      -xC 40 \
    > ./assembly_prefix.corStore.err 2>&1

-- Finished on Sun Jul 28 04:39:08 2024 (one second) with 3801.371 GB free disk space
----------------------------------------
-- Finished stage 'cor-buildCorrectionLayoutsConfigure', reset canuIteration.
-- Computing correction layouts.
----------------------------------------
-- Starting command on Sun Jul 28 04:39:08 2024 with 3801.371 GB free disk space

    cd correction/2-correction
    /Users/yonastekle/Documents/canu/build/bin/filterCorrectionLayouts \
      -S  ../../assembly_prefix.seqStore \
      -C     ../assembly_prefix.corStore \
      -R      ./assembly_prefix.readsToCorrect.WORKING \
      -cc 4 \
      -cl 1000 \
      -g  200000000 \
      -c  40 \
    > ./assembly_prefix.readsToCorrect.err 2>&1

-- Finished on Sun Jul 28 04:39:12 2024 (4 seconds) with 3801.191 GB free disk space
----------------------------------------
--                             original      original
--                            raw reads     raw reads
--   category                w/overlaps  w/o/overlaps
--   -------------------- ------------- -------------
--   Number of Reads                  0       2634323
--   Number of Bases                  0             0
--   Coverage                     0.000         0.000
--   Median                           0             0
--   Mean                             0             0
--   N50                              0             0
--   Minimum                          0             0
--   Maximum                          0             0
--   
--                                        --------corrected---------  ----------rescued----------
--                             evidence                     expected                     expected
--   category                     reads            raw     corrected            raw     corrected
--   -------------------- -------------  ------------- -------------  ------------- -------------
--   Number of Reads                  0              0             0              0             0
--   Number of Bases                  0              0             0              0             0
--   Coverage                     0.000          0.000         0.000          0.000         0.000
--   Median                           0              0             0              0             0
--   Mean                             0              0             0              0             0
--   N50                              0              0             0              0             0
--   Minimum                          0              0             0              0             0
--   Maximum                          0              0             0              0             0
--   
--                        --------uncorrected--------
--                                           expected
--   category                       raw     corrected
--   -------------------- ------------- -------------
--   Number of Reads            2634323       2634323
--   Number of Bases                  0             0
--   Coverage                     0.000         0.000
--   Median                           0             0
--   Mean                             0             0
--   N50                              0             0
--   Minimum                          0             0
--   Maximum                          0             0
--   
--   Maximum Memory                   0
-- Finished stage 'cor-filterCorrectionLayouts', reset canuIteration.
--
-- Correction jobs estimated to need at most 0 GB for computation.
-- Correction jobs will request 12 GB each.
--
-- Local: cor       12.000 GB    4 CPUs x   7 jobs    84.000 GB  28 CPUs  (read correction)
--
--
-- Configuring correction jobs:
--   Reads estimated to need at most 0 GB for computation.
--   Jobs will request 12 GB each.
----------------------------------------
-- Starting command on Sun Jul 28 04:39:13 2024 with 3801.191 GB free disk space

    cd correction/2-correction
    ./correctReadsPartition.sh \
    > ./correctReadsPartition.err 2>&1

-- Finished on Sun Jul 28 04:39:14 2024 (one second) with 3801.187 GB free disk space
----------------------------------------
-- Finished stage 'cor-generateCorrectedReadsConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cor' concurrent execution on Sun Jul 28 04:39:14 2024 with 3801.187 GB free disk space (1 processes; 7 concurrently)

    cd correction/2-correction
    ./correctReads.sh 1 > ./correctReads.000001.out 2>&1

-- Finished on Sun Jul 28 04:39:14 2024 (in the blink of an eye) with 3801.187 GB free disk space
----------------------------------------
-- Found 1 read correction output files.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
-- Found 1 read correction output files.
-- Finished stage 'cor-generateCorrectedReadsCheck', reset canuIteration.
--
-- Loading corrected reads into corStore and seqStore.
----------------------------------------
-- Starting command on Sun Jul 28 04:39:14 2024 with 3801.187 GB free disk space

    cd correction
    /Users/yonastekle/Documents/canu/build/bin/loadCorrectedReads \
      -S ../assembly_prefix.seqStore \
      -C ./assembly_prefix.corStore \
      -L ./2-correction/corjob.files \
    >  ./assembly_prefix.loadCorrectedReads.log \
    2> ./assembly_prefix.loadCorrectedReads.err

-- Finished on Sun Jul 28 04:39:17 2024 (3 seconds) with 3801.027 GB free disk space
----------------------------------------
--
-- No corrected reads generated; correctReads output saved.
--
-- Purging overlaps used for correction.
-- Finished stage 'cor-loadCorrectedReads', reset canuIteration.
----------------------------------------
-- Starting command on Sun Jul 28 04:39:17 2024 with 3801.086 GB free disk space

    cd .
    /Users/yonastekle/Documents/canu/build/bin/sqStoreDumpFASTQ \
      -corrected \
      -S ./assembly_prefix.seqStore \
      -o ./assembly_prefix.correctedReads.gz \
      -fasta \
      -nolibname \
    > assembly_prefix.correctedReads.fasta.err 2>&1

-- Finished on Sun Jul 28 04:39:17 2024 (like a bat out of hell) with 3801.086 GB free disk space
----------------------------------------
--
-- Corrected reads saved in 'assembly_prefix.correctedReads.fasta.gz'.
-- Finished stage 'cor-dumpCorrectedReads', reset canuIteration.
--
-- Trimming skipped; no corrected reads exist in assembly_prefix.seqStore.
--
-- Unitigging skipped; no corrected reads to assemble.
--
-- Bye.
skoren commented 2 months ago

There were no changes to defaults for min read or min overlap length in later versions. It's still 1000/500 unless you specify something else. What are your read lengths? It looks like they're all <1kb, if that's the case you need to drop minOverlap and minRead lengths.

Ppatel09-wq commented 2 months ago

Thank you for your response. By drop you mean not use the parameters or decrease their values to something like minReadLength=500 and minOverlapLength=250? Also, if I do that do i need to rerun it or is there a way I can continue the run?

skoren commented 2 months ago

No parameter changes would help this run (corMinCoverage, corErrorRate) at this point since it already computed overlaps and they wouldn't be recomputed, those experiments would have to be done from scratch.

Your current run didn't use the parameters so they were set to the defaults of 1000 and 500 for read and overlap. I mean drop to use a lower value like 500/100 or 100/100 (depending on what your read lengths are). The report output by canu should give some idea of the read length distribution. It seems like you had a lot of reads over 1kb but no overlaps. That is bit suspicious and makes me suspect another issue with the data, especially since the log says there were multiple gb of overlap data. Was this run interrupted and re-started at some point? Post the full report that should have more info on what happened.

Ppatel09-wq commented 2 months ago

this is what the report looks like assembly_prefix.pdf No the run was not interrupted or restarted. Also this is a hybrid assembly I am running with Pacbio and nanopore data

skoren commented 2 months ago

That is strange, there are plenty of reads but they end up with no overlaps, given the lengths there should be overlaps longer than 500bp. I think something went wrong with the overlapper store build but we've not seen similar issues before. What are the contents of correction/1-overlapper/results and can you grep for "Total matches" in the correction/1-overlapper/mhap*out files? What is the source of this sequencing data (metagenome, single organism, etc)? Are you able to share the input reads?

Ppatel09-wq commented 2 months ago

For some reason there is no results file in correction/1-overlapper/results Yes I can share the input reads through dropbox if you can please provide me you email. It is a monoclonal(single) amoeba. But these are the matches, seems to be a lot:

SCI247BIO-YT:1-overlapper yonastekle$ grep "Total matches" mhap*out
mhap.000001.out:Total matches found: 1423880786
mhap.000002.out:Total matches found: 1621658642
mhap.000003.out:Total matches found: 608602384
mhap.000004.out:Total matches found: 56993249
mhap.000005.out:Total matches found: 1356368365
mhap.000006.out:Total matches found: 1547237400
mhap.000007.out:Total matches found: 497812789
mhap.000008.out:Total matches found: 53065300
mhap.000009.out:Total matches found: 1427198315
mhap.000010.out:Total matches found: 1631075177
mhap.000011.out:Total matches found: 434903823
mhap.000012.out:Total matches found: 48550108
mhap.000013.out:Total matches found: 1356212155
mhap.000014.out:Total matches found: 1558199864
mhap.000015.out:Total matches found: 336939074
mhap.000016.out:Total matches found: 45108448
mhap.000017.out:Total matches found: 1353412919
mhap.000018.out:Total matches found: 1567792150
mhap.000019.out:Total matches found: 257943579
mhap.000020.out:Total matches found: 41741666
mhap.000021.out:Total matches found: 1379769834
mhap.000022.out:Total matches found: 1593048246
mhap.000023.out:Total matches found: 181511143
mhap.000024.out:Total matches found: 39915700
mhap.000025.out:Total matches found: 1416197994
mhap.000026.out:Total matches found: 1620637307
mhap.000027.out:Total matches found: 104010599
mhap.000028.out:Total matches found: 35762779
mhap.000029.out:Total matches found: 1550211080
mhap.000030.out:Total matches found: 1730302060
mhap.000031.out:Total matches found: 72603343
mhap.000032.out:Total matches found: 38028727
mhap.000033.out:Total matches found: 1525171641
mhap.000034.out:Total matches found: 1608921878
mhap.000035.out:Total matches found: 74480188
mhap.000036.out:Total matches found: 35192709
mhap.000037.out:Total matches found: 1550890736
mhap.000038.out:Total matches found: 1520034288
mhap.000039.out:Total matches found: 78041345
mhap.000040.out:Total matches found: 32838963
mhap.000041.out:Total matches found: 1511363639
mhap.000042.out:Total matches found: 1375121283
mhap.000043.out:Total matches found: 81545345
mhap.000044.out:Total matches found: 30119045
mhap.000045.out:Total matches found: 1584019664
mhap.000046.out:Total matches found: 1328451017
mhap.000047.out:Total matches found: 80526453
mhap.000048.out:Total matches found: 25476516
mhap.000049.out:Total matches found: 1587118307
mhap.000050.out:Total matches found: 1235879200
mhap.000051.out:Total matches found: 81837206
mhap.000052.out:Total matches found: 21635621
mhap.000053.out:Total matches found: 1586051528
mhap.000054.out:Total matches found: 1120145807
mhap.000055.out:Total matches found: 80384956
mhap.000056.out:Total matches found: 16967842
mhap.000057.out:Total matches found: 1660709342
mhap.000058.out:Total matches found: 1091444041
mhap.000059.out:Total matches found: 76126050
mhap.000060.out:Total matches found: 12313952
mhap.000061.out:Total matches found: 1928852742
mhap.000062.out:Total matches found: 1126172223
mhap.000063.out:Total matches found: 64778232
mhap.000064.out:Total matches found: 7129381
mhap.000065.out:Total matches found: 1930325287
mhap.000066.out:Total matches found: 1018242357
mhap.000067.out:Total matches found: 66756889
mhap.000068.out:Total matches found: 3871973
mhap.000069.out:Total matches found: 1891800530
mhap.000070.out:Total matches found: 889700370
mhap.000071.out:Total matches found: 65420964
mhap.000072.out:Total matches found: 210831
mhap.000073.out:Total matches found: 1945107299
mhap.000074.out:Total matches found: 806334725
mhap.000075.out:Total matches found: 62240793
mhap.000076.out:Total matches found: 1923907129
mhap.000077.out:Total matches found: 692750239
mhap.000078.out:Total matches found: 61168657
mhap.000079.out:Total matches found: 1869418405
mhap.000080.out:Total matches found: 575566480
mhap.000081.out:Total matches found: 56699220
mhap.000082.out:Total matches found: 1859593878
mhap.000083.out:Total matches found: 477442803
mhap.000084.out:Total matches found: 51862873
mhap.000085.out:Total matches found: 1779273790
mhap.000086.out:Total matches found: 370857909
mhap.000087.out:Total matches found: 48910488
mhap.000088.out:Total matches found: 1753428510
mhap.000089.out:Total matches found: 277145118
mhap.000090.out:Total matches found: 43536735
mhap.000091.out:Total matches found: 2002496604
mhap.000092.out:Total matches found: 208657738
mhap.000093.out:Total matches found: 40921944
mhap.000094.out:Total matches found: 2085670625
mhap.000095.out:Total matches found: 116655235
mhap.000096.out:Total matches found: 37183574
mhap.000097.out:Total matches found: 1998373396
mhap.000098.out:Total matches found: 65209182
mhap.000099.out:Total matches found: 34090413
mhap.000100.out:Total matches found: 1999866434
mhap.000101.out:Total matches found: 67361788
mhap.000102.out:Total matches found: 31794455
mhap.000103.out:Total matches found: 2139263702
mhap.000104.out:Total matches found: 77776745
mhap.000105.out:Total matches found: 32680780
mhap.000106.out:Total matches found: 1979675487
mhap.000107.out:Total matches found: 75601667
mhap.000108.out:Total matches found: 27902662
mhap.000109.out:Total matches found: 1837496381
mhap.000110.out:Total matches found: 77056828
mhap.000111.out:Total matches found: 24351069
mhap.000112.out:Total matches found: 1620815253
mhap.000113.out:Total matches found: 76995393
mhap.000114.out:Total matches found: 20362388
mhap.000115.out:Total matches found: 1675521247
mhap.000116.out:Total matches found: 66414480
mhap.000117.out:Total matches found: 14012855
mhap.000118.out:Total matches found: 1152477261
mhap.000119.out:Total matches found: 56041103
mhap.000120.out:Total matches found: 9099008
mhap.000121.out:Total matches found: 1331362686
mhap.000122.out:Total matches found: 93604862
mhap.000123.out:Total matches found: 10323704
mhap.000124.out:Total matches found: 1268552547
mhap.000125.out:Total matches found: 102850496
mhap.000126.out:Total matches found: 5973457
mhap.000127.out:Total matches found: 1164998874
mhap.000128.out:Total matches found: 105886098
mhap.000129.out:Total matches found: 344270
mhap.000130.out:Total matches found: 1017193942
mhap.000131.out:Total matches found: 99308426
mhap.000132.out:Total matches found: 924806491
mhap.000133.out:Total matches found: 97870354
mhap.000134.out:Total matches found: 773547260
mhap.000135.out:Total matches found: 91697599
mhap.000136.out:Total matches found: 636892580
mhap.000137.out:Total matches found: 77788231
mhap.000138.out:Total matches found: 530824874
mhap.000139.out:Total matches found: 73911150
mhap.000140.out:Total matches found: 424348275
mhap.000141.out:Total matches found: 68385378
mhap.000142.out:Total matches found: 324347520
mhap.000143.out:Total matches found: 64203054
mhap.000144.out:Total matches found: 212471198
mhap.000145.out:Total matches found: 60042047
mhap.000146.out:Total matches found: 178301240
mhap.000147.out:Total matches found: 84126227
mhap.000148.out:Total matches found: 216427227
mhap.000149.out:Total matches found: 100086616
mhap.000150.out:Total matches found: 203846690
mhap.000151.out:Total matches found: 84163213
mhap.000152.out:Total matches found: 199640890
mhap.000153.out:Total matches found: 72304004
mhap.000154.out:Total matches found: 202350244
mhap.000155.out:Total matches found: 62695204
mhap.000156.out:Total matches found: 206419510
mhap.000157.out:Total matches found: 53501871
mhap.000158.out:Total matches found: 200846576
mhap.000159.out:Total matches found: 41752049
mhap.000160.out:Total matches found: 207383326
mhap.000161.out:Total matches found: 32836009
mhap.000162.out:Total matches found: 203399144
mhap.000163.out:Total matches found: 21896699
mhap.000164.out:Total matches found: 207554009
mhap.000165.out:Total matches found: 12048619
mhap.000166.out:Total matches found: 210416882
mhap.000167.out:Total matches found: 711997
mhap.000168.out:Total matches found: 190671745
mhap.000169.out:Total matches found: 184102844
mhap.000170.out:Total matches found: 169775100
mhap.000171.out:Total matches found: 160166458
mhap.000172.out:Total matches found: 146466451
mhap.000173.out:Total matches found: 136083180
mhap.000174.out:Total matches found: 129408388
mhap.000175.out:Total matches found: 118641327
mhap.000176.out:Total matches found: 108693602
mhap.000177.out:Total matches found: 100057581
mhap.000178.out:Total matches found: 88117682
mhap.000179.out:Total matches found: 77796986
mhap.000180.out:Total matches found: 68891999
mhap.000181.out:Total matches found: 57479279
mhap.000182.out:Total matches found: 47372161
mhap.000183.out:Total matches found: 36189773
mhap.000184.out:Total matches found: 26605039
mhap.000185.out:Total matches found: 16181228
mhap.000186.out:Total matches found: 6625454
mhap.000187.out:Total matches found: 24124
skoren commented 2 months ago

Yes, so it doesn't make sense as to why you'd end up with no overlaps in the store. My email is sergek@umd.edu.

Ppatel09-wq commented 2 months ago

Thank you I shared them with you.

skoren commented 2 months ago

It seems I don't have enough space in my dropbox to access them, can you either send them through our FTP (see FAQ) or send via a read-only download link in dropbox?

skoren commented 2 months ago

My local run finished without issue, I suspect your system is silently failing on the named pipes canu uses by default. Adding mhapPipe=false should fix it. I uploaded the corrected reads to the same google drive folder where you sent the raw data. However, the corrected reads don't really look corrected or at least don't look like they're at the expected coverage for a 200mb genome. The histogram looks like:

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2  85576218 *******************************************************************    0.3067 0.0126
--       3-     7  89333196 ********************************************************************** 0.4381 0.0207
--       8-    16  29546050 ***********************                                                0.6483 0.0440
--      17-    29  13868724 **********                                                             0.7387 0.0664
--      30-    46   8220160 ******                                                                 0.7850 0.0884
--      47-    67   5869401 ****                                                                   0.8133 0.1105
--      68-    92   5384887 ****                                                                   0.8339 0.1349
--      93-   121   5471392 ****                                                                   0.8531 0.1666
--     122-   154   5758163 ****                                                                   0.8726 0.2100
--     155-   191   5747340 ****                                                                   0.8932 0.2687
--     192-   232   5433231 ****                                                                   0.9137 0.3420
--     233-   277   5155674 ****                                                                   0.9331 0.4267
--     278-   326   4338755 ***                                                                    0.9515 0.5233
--     327-   379   3198639 **                                                                     0.9670 0.6192
--     380-   436   2182145 *                                                                      0.9784 0.7015
--     437-   497   1406634 *                                                                      0.9861 0.7663
--     498-   562    853316                                                                        0.9911 0.8142
--     563-   631    470576                                                                        0.9942 0.8471
--     632-   704    256887                                                                        0.9958 0.8674
--     705-   781    160436                                                                        0.9967 0.8799
--     782-   862    126425                                                                        0.9973 0.8886
--     863-   947     94937                                                                        0.9978 0.8963
--     948-  1036     75319                                                                        0.9981 0.9026
--    1037-  1129     60305                                                                        0.9984 0.9081
--    1130-  1226     52017                                                                        0.9986 0.9129
--    1227-  1327     42525                                                                        0.9988 0.9174
--    1328-  1432     36318                                                                        0.9989 0.9214
--    1433-  1541     30722                                                                        0.9991 0.9250
--    1542-  1654     25949                                                                        0.9992 0.9284
--    1655-  1771     22578                                                                        0.9993 0.9314
--    1772-  1892     19330                                                                        0.9993 0.9343
--    1893-  2017     16408                                                                        0.9994 0.9369
--    2018-  2146     13414                                                                        0.9995 0.9392
--    2147-  2279     12284                                                                        0.9995 0.9413
--    2280-  2416     10528                                                                        0.9996 0.9433
--    2417-  2557      9909                                                                        0.9996 0.9451
--    2558-  2702      8337                                                                        0.9996 0.9469
--    2703-  2851      8274                                                                        0.9997 0.9485
--    2852-  3004      7519                                                                        0.9997 0.9502
--    3005-  3161      6998                                                                        0.9997 0.9519
--

so there's no clear peaks at all and it just all looks like noise with, at best, a peak around 3-7x. Genomescope can't make any predictions from this histogram. By contrast, here's a human dataset at 40x with ONT data:

[TRIMMING/MERS]
--
--  22-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1         0                                                                        0.0000 0.0000
--       2-     2  82768762 ******                                                                 0.0330 0.0015
--       3-     4  59807828 *****                                                                  0.0476 0.0025
--       5-     7  46311634 ***                                                                    0.0639 0.0041
--       8-    11  51875039 ****                                                                   0.0805 0.0066
--      12-    16  82093790 ******                                                                 0.1017 0.0116
--      17-    22 130002157 **********                                                             0.1366 0.0234
--      23-    29 222579795 ******************                                                     0.1901 0.0480
--      30-    37 604893114 **************************************************                     0.2894 0.1090
--      38-    46 834662807 ********************************************************************** 0.5518 0.3138
--      47-    56 289872796 ************************                                               0.8672 0.6137
--      57-    67  29409408 **                                                                     0.9617 0.7209
--      68-    79  16933434 *                                                                      0.9714 0.7343
--      80-    92  14888071 *                                                                      0.9782 0.7456
--      93-   106   7149138                                                                        0.9839 0.7565
--     107-   121   5669134                                                                        0.9866 0.7626
--     122-   137   4630534                                                                        0.9888 0.7684
--     138-   154   3198317                                                                        0.9906 0.7736
--     155-   172   2812920                                                                        0.9919 0.7777
--     173-   191   2216993                                                                        0.9930 0.7817
--     192-   211   1818372                                                                        0.9939 0.7853
--     212-   232   1534188                                                                        0.9946 0.7885
--     233-   254   1303004                                                                        0.9952 0.7915
--     255-   277   1118455                                                                        0.9957 0.7944
--     278-   301    953098                                                                        0.9962 0.7970
--     302-   326    814331                                                                        0.9965 0.7994
--     327-   352    700584                                                                        0.9969 0.8017
--     353-   379    609733                                                                        0.9971 0.8038
--     380-   407    515937                                                                        0.9974 0.8058
--     408-   436    445829                                                                        0.9976 0.8076
--     437-   466    391733                                                                        0.9978 0.8093
--     467-   497    351234                                                                        0.9979 0.8108
--     498-   529    320042                                                                        0.9980 0.8123
--     530-   562    292467                                                                        0.9982 0.8138
--     563-   596    318018                                                                        0.9983 0.8152
--     597-   631    309065                                                                        0.9984 0.8169
--     632-   667    228540                                                                        0.9985 0.8185
--     668-   704    207870                                                                        0.9986 0.8199
--     705-   742    185351                                                                        0.9987 0.8211
--     743-   781    167648                                                                        0.9988 0.8223
--     782-   821    154045                                                                        0.9989 0.8235
--

II don't expect this will assemble very well if at all, especially since the reads are also relatively short.

Is this sample amplified in any way and are you confident in the 200 mb genome size? I also noticed that you have a good number of reads with more than 3 passes so CCS conversion might give you 10-15x HiFi coverage (assuming 200 mb is accurate) if you run it and that may give you more info on what's going on here. Otherwise, I don't think there's much you can do with canu with this dataset.

skoren commented 1 month ago

No overlaps are resolved w/mhapPipe option, the histogram is due to amplification of sample.