Closed ChloeDG closed 4 years ago
I'm not sure this is a gap, since the largest contigs are close to your genome size already and their sum is larger than the genome size. The contigs may just represent variants present in the data but are redundant with each other.
It's also possible you do have gaps. What is the full report from the assembly? Have you mapped the assembly to a reference (assuming you have one)? Have you mapped reads to the reference to see if there are coverage gaps? You could also try the suggestions in #1322 progressively to see if the assembly improves. I also noticed you're using a tip version and not a release, I'd suggest using either the 1.9 or the 2.0 release instead.
I haven't tried mapping it to a reference. We have one that we have made synthetically but it isn't the best. This is the full report: I am trying to make sense of it.
[CORRECTION/READS]
--
-- In sequence store './HBVFL.seqStore':
-- Found 541 reads.
-- Found 640069 bases (200.02 times coverage).
--
-- G=640069 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 1452 39 64585 || 1000-1025 49|-------------------------------------
-- 00020 1288 86 128216 || 1026-1051 46|-----------------------------------
-- 00030 1230 137 192272 || 1052-1077 42|--------------------------------
-- 00040 1185 190 256228 || 1078-1103 28|---------------------
-- 00050 1151 245 320435 || 1104-1129 54|-----------------------------------------
-- 00060 1136 301 384432 || 1130-1155 84|---------------------------------------------------------------
-- 00070 1115 358 448640 || 1156-1181 43|---------------------------------
-- 00080 1071 417 512999 || 1182-1207 34|--------------------------
-- 00090 1032 478 577086 || 1208-1233 29|----------------------
-- 00100 1000 540 640069 || 1234-1259 23|------------------
-- 001.000x 541 640069 || 1260-1285 21|----------------
-- || 1286-1311 11|---------
-- || 1312-1337 10|--------
-- || 1338-1363 9|-------
-- || 1364-1389 8|------
-- || 1390-1415 5|----
-- || 1416-1441 2|--
-- || 1442-1467 7|------
-- || 1468-1493 5|----
-- || 1494-1519 6|-----
-- || 1520-1545 3|---
-- || 1546-1571 3|---
-- || 1572-1597 1|-
-- || 1598-1623 1|-
-- || 1624-1649 3|---
-- || 1650-1675 4|---
-- || 1676-1701 0|
-- || 1702-1727 3|---
-- || 1728-1753 1|-
-- || 1754-1779 0|
-- || 1780-1805 0|
-- || 1806-1831 4|---
-- || 1832-1857 0|
-- || 1858-1883 1|-
-- || 1884-1909 0|
-- || 1910-1935 0|
-- || 1936-1961 0|
-- || 1962-1987 0|
-- || 1988-2013 0|
-- || 2014-2039 0|
-- || 2040-2065 0|
-- || 2066-2091 0|
-- || 2092-2117 0|
-- || 2118-2143 0|
-- || 2144-2169 0|
-- || 2170-2195 0|
-- || 2196-2221 0|
-- || 2222-2247 0|
-- || 2248-2273 0|
-- || 2274-2299 1|-
--
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 22356 ********************************************************************** 0.4739 0.1122
-- 3- 4 12504 *************************************** 0.6521 0.1756
-- 5- 7 4721 ************** 0.7875 0.2454
-- 8- 11 2122 ****** 0.8545 0.2992
-- 12- 16 1152 *** 0.8904 0.3426
-- 17- 22 886 ** 0.9119 0.3803
-- 23- 29 808 ** 0.9299 0.4236
-- 30- 37 571 * 0.9465 0.4766
-- 38- 46 532 * 0.9577 0.5217
-- 47- 56 490 * 0.9689 0.5789
-- 57- 67 353 * 0.9790 0.6411
-- 68- 79 244 0.9861 0.6937
-- 80- 92 118 0.9911 0.7379
-- 93- 106 48 0.9933 0.7605
-- 107- 121 48 0.9943 0.7726
-- 122- 137 18 0.9953 0.7853
-- 138- 154 20 0.9957 0.7911
-- 155- 172 26 0.9961 0.7993
-- 173- 191 42 0.9967 0.8100
-- 192- 211 17 0.9975 0.8283
-- 212- 232 14 0.9979 0.8369
-- 233- 254 16 0.9982 0.8447
-- 255- 277 7 0.9985 0.8545
-- 278- 301 5 0.9987 0.8593
-- 302- 326 3 0.9988 0.8630
-- 327- 352 6 0.9988 0.8654
-- 353- 379 3 0.9990 0.8707
-- 380- 407 2 0.9990 0.8735
-- 408- 436 3 0.9991 0.8755
-- 437- 466 6 0.9992 0.8799
-- 467- 497 0 0.0000 0.0000
-- 498- 529 0 0.0000 0.0000
-- 530- 562 0 0.0000 0.0000
-- 563- 596 1 0.9993 0.8858
--
-- 0 (max occurrences)
-- 398395 (total mers, non-unique)
-- 47177 (distinct mers, non-unique)
-- 0 (unique mers)
[CORRECTION/LAYOUT]
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 525 8368
-- Number of Bases 621662 18407
-- Coverage 194.269 5.752
-- Median 1143 0
-- Mean 1184 2
-- N50 1151 1166
-- Minimum 1000 0
-- Maximum 2288 1268
--
-- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 531 94 94 4 4
-- Number of Bases 628588 132323 128406 4575 4526
-- Coverage 196.434 41.351 40.127 1.430 1.414
-- Median 1143 1370 1332 1158 1137
-- Mean 1183 1407 1366 1143 1131
-- N50 1151 1385 1350 1173 1157
-- Minimum 1000 1203 1202 1110 1109
-- Maximum 2288 1864 1863 1173 1157
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 8795 8795
-- Number of Bases 503171 401102
-- Coverage 157.241 125.344
-- Median 0 0
-- Mean 57 45
-- N50 1135 1090
-- Minimum 0 0
-- Maximum 2288 2271
--
-- Maximum Memory 546352838
[TRIMMING/READS]
--
-- In sequence store './HBVFL.seqStore':
-- Found 98 reads.
-- Found 122481 bases (38.27 times coverage).
--
-- G=122481 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 1553 7 13316 || 420-447 1|-------
-- 00020 1453 15 25118 || 448-475 0|
-- 00030 1366 24 37719 || 476-503 0|
-- 00040 1332 33 49839 || 504-531 0|
-- 00050 1300 42 61662 || 532-559 0|
-- 00060 1256 52 74421 || 560-587 0|
-- 00070 1215 62 86731 || 588-615 0|
-- 00080 1188 72 98710 || 616-643 2|-------------
-- 00090 1136 82 110297 || 644-671 0|
-- 00100 420 97 122481 || 672-699 1|-------
-- 001.000x 98 122481 || 700-727 2|-------------
-- || 728-755 0|
-- || 756-783 0|
-- || 784-811 1|-------
-- || 812-839 1|-------
-- || 840-867 3|-------------------
-- || 868-895 0|
-- || 896-923 0|
-- || 924-951 1|-------
-- || 952-979 0|
-- || 980-1007 0|
-- || 1008-1035 0|
-- || 1036-1063 1|-------
-- || 1064-1091 1|-------
-- || 1092-1119 0|
-- || 1120-1147 5|--------------------------------
-- || 1148-1175 3|-------------------
-- || 1176-1203 10|---------------------------------------------------------------
-- || 1204-1231 8|---------------------------------------------------
-- || 1232-1259 7|---------------------------------------------
-- || 1260-1287 4|--------------------------
-- || 1288-1315 9|---------------------------------------------------------
-- || 1316-1343 7|---------------------------------------------
-- || 1344-1371 8|---------------------------------------------------
-- || 1372-1399 3|-------------------
-- || 1400-1427 3|-------------------
-- || 1428-1455 2|-------------
-- || 1456-1483 5|--------------------------------
-- || 1484-1511 1|-------
-- || 1512-1539 1|-------
-- || 1540-1567 1|-------
-- || 1568-1595 0|
-- || 1596-1623 2|-------------
-- || 1624-1651 1|-------
-- || 1652-1679 1|-------
-- || 1680-1707 0|
-- || 1708-1735 2|-------------
-- || 1736-1763 0|
-- || 1764-1791 0|
-- || 1792-1819 1|-------
--
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 1974 ********************************************************************** 0.3279 0.0389
-- 3- 4 767 *************************** 0.3963 0.0511
-- 5- 7 504 ***************** 0.4924 0.0761
-- 8- 11 565 ******************** 0.5678 0.1078
-- 12- 16 1212 ****************************************** 0.6572 0.1644
-- 17- 22 655 *********************** 0.8612 0.3434
-- 23- 29 79 ** 0.9464 0.4395
-- 30- 37 78 ** 0.9575 0.4571
-- 38- 46 36 * 0.9696 0.4813
-- 47- 56 23 0.9754 0.4961
-- 57- 67 13 0.9789 0.5069
-- 68- 79 7 0.9811 0.5151
-- 80- 92 11 0.9824 0.5211
-- 93- 106 0 0.0000 0.0000
-- 107- 121 4 0.9841 0.5297
-- 122- 137 2 0.9847 0.5345
-- 138- 154 15 0.9852 0.5385
-- 155- 172 16 0.9875 0.5588
-- 173- 191 3 0.9902 0.5848
-- 192- 211 4 0.9909 0.5923
-- 212- 232 1 0.9914 0.5986
-- 233- 254 12 0.9915 0.6009
-- 255- 277 0 0.0000 0.0000
-- 278- 301 1 0.9935 0.6302
-- 302- 326 0 0.0000 0.0000
-- 327- 352 1 0.9937 0.6337
-- 353- 379 1 0.9939 0.6372
-- 380- 407 0 0.0000 0.0000
-- 408- 436 0 0.0000 0.0000
-- 437- 466 0 0.0000 0.0000
-- 467- 497 0 0.0000 0.0000
-- 498- 529 2 0.9940 0.6424
-- 530- 562 1 0.9944 0.6530
-- 563- 596 10 0.9945 0.6587
-- 597- 631 4 0.9962 0.7164
--
-- 0 (max occurrences)
-- 101440 (total mers, non-unique)
-- 6021 (distinct mers, non-unique)
-- 0 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1200 (use overlaps at or below this fraction error)
-- 500 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 2 (break region if overlap coverage is less than this many reads, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 8893 reads 122481 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 63 reads 76371 bases (trimmed reads output)
-- 0 reads 0 bases (reads with no change, kept as is)
-- 8801 reads 4579 bases (reads with no overlaps, deleted)
-- 29 reads 34770 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 50 reads 3231 bases (bases trimmed from the 5' end of a read)
-- 55 reads 3530 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1200 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 63 reads 83132 bases (reads processed)
-- 8830 reads 39349 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 0 reads 0 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 63 reads 83132 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 0 reads 0 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 0 reads 0 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 0 reads 0 bases (trimmed from the 5' end of the read)
-- 0 reads 0 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In sequence store './HBVFL.seqStore':
-- Found 63 reads.
-- Found 76371 bases (23.86 times coverage).
--
-- G=76371 sum of || length num
-- NG length index lengths || range seqs
-- ----- ------------ --------- ------------ || ------------------- -------
-- 00010 1467 4 7666 || 1009-1023 2|---------------------
-- 00020 1335 10 15919 || 1024-1038 0|
-- 00030 1257 16 23663 || 1039-1053 2|---------------------
-- 00040 1235 22 31130 || 1054-1068 4|------------------------------------------
-- 00050 1196 28 38422 || 1069-1083 2|---------------------
-- 00060 1162 35 46649 || 1084-1098 1|-----------
-- 00070 1136 41 53540 || 1099-1113 2|---------------------
-- 00080 1119 48 61440 || 1114-1128 4|------------------------------------------
-- 00090 1066 55 69065 || 1129-1143 6|---------------------------------------------------------------
-- 00100 1009 62 76371 || 1144-1158 4|------------------------------------------
-- 001.000x 63 76371 || 1159-1173 2|---------------------
-- || 1174-1188 5|-----------------------------------------------------
-- || 1189-1203 2|---------------------
-- || 1204-1218 0|
-- || 1219-1233 4|------------------------------------------
-- || 1234-1248 4|------------------------------------------
-- || 1249-1263 3|--------------------------------
-- || 1264-1278 1|-----------
-- || 1279-1293 1|-----------
-- || 1294-1308 1|-----------
-- || 1309-1323 1|-----------
-- || 1324-1338 2|---------------------
-- || 1339-1353 2|---------------------
-- || 1354-1368 1|-----------
-- || 1369-1383 0|
-- || 1384-1398 0|
-- || 1399-1413 0|
-- || 1414-1428 1|-----------
-- || 1429-1443 1|-----------
-- || 1444-1458 0|
-- || 1459-1473 2|---------------------
-- || 1474-1488 0|
-- || 1489-1503 1|-----------
-- || 1504-1518 1|-----------
-- || 1519-1533 0|
-- || 1534-1548 0|
-- || 1549-1563 0|
-- || 1564-1578 0|
-- || 1579-1593 0|
-- || 1594-1608 0|
-- || 1609-1623 0|
-- || 1624-1638 0|
-- || 1639-1653 0|
-- || 1654-1668 0|
-- || 1669-1683 0|
-- || 1684-1698 0|
-- || 1699-1713 0|
-- || 1714-1728 1|-----------
--
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 680 *************************************** 0.1673 0.0196
-- 3- 4 435 ************************* 0.2158 0.0281
-- 5- 7 442 ************************* 0.3214 0.0556
-- 8- 11 532 ****************************** 0.4171 0.0947
-- 12- 16 1209 ********************************************************************** 0.5477 0.1762
-- 17- 22 599 ********************************** 0.8492 0.4360
-- 23- 29 46 ** 0.9604 0.5583
-- 30- 37 9 0.9705 0.5739
-- 38- 46 6 0.9729 0.5789
-- 47- 56 10 0.9749 0.5844
-- 57- 67 0 0.0000 0.0000
-- 68- 79 9 0.9766 0.5898
-- 80- 92 20 * 0.9793 0.6018
-- 93- 106 12 0.9840 0.6257
-- 107- 121 2 0.9867 0.6414
-- 122- 137 14 0.9872 0.6449
-- 138- 154 0 0.0000 0.0000
-- 155- 172 0 0.0000 0.0000
-- 173- 191 0 0.0000 0.0000
-- 192- 211 0 0.0000 0.0000
-- 212- 232 0 0.0000 0.0000
-- 233- 254 1 0.9906 0.6725
-- 255- 277 1 0.9909 0.6762
-- 278- 301 0 0.0000 0.0000
-- 302- 326 0 0.0000 0.0000
-- 327- 352 0 0.0000 0.0000
-- 353- 379 2 0.9911 0.6813
-- 380- 407 5 0.9919 0.6977
-- 408- 436 10 0.9929 0.7210
-- 437- 466 0 0.0000 0.0000
-- 467- 497 0 0.0000 0.0000
-- 498- 529 0 0.0000 0.0000
-- 530- 562 0 0.0000 0.0000
-- 563- 596 0 0.0000 0.0000
-- 597- 631 0 0.0000 0.0000
-- 632- 667 0 0.0000 0.0000
-- 668- 704 2 0.9953 0.7839
-- 705- 742 1 0.9958 0.8042
-- 743- 781 1 0.9961 0.8154
-- 782- 821 15 0.9963 0.8267
--
-- 0 (max occurrences)
-- 69347 (total mers, non-unique)
-- 4064 (distinct mers, non-unique)
-- 0 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (bad trimming)
-- middle-hump 1 0.18 1240.00 +- 0.00 13.00 +- 0.00 (bad trimming)
-- no-5-prime 1 0.18 1468.00 +- 0.00 1.00 +- 0.00 (bad trimming)
-- no-3-prime 1 0.18 1281.00 +- 0.00 4.00 +- 0.00 (bad trimming)
--
-- low-coverage 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (easy to assemble, potential for lower quality consensus)
-- unique 25 4.62 1151.08 +- 84.37 17.39 +- 4.14 (easy to assemble, perfect, yay)
-- repeat-cont 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (potential for consensus errors, no impact on assembly)
-- repeat-dove 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 7 1.29 1246.71 +- 137.12 881.29 +- 463.05 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 20 3.70 1188.75 +- 100.27 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 8 1.48 1387.88 +- 195.74 (will end contigs, potential to misassemble)
-- uniq-anchor 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 2 sequences, total length 3961 bp (including 0 repeats of total length 0 bp).
-- bubbles: 1 sequences, total length 1799 bp.
-- unassembled: 19 sequences, total length 22971 bp.
--
-- Contig sizes based on genome size 3.2kbp:
--
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2439 1 2439
-- 20 2439 1 2439
-- 30 2439 1 2439
-- 40 2439 1 2439
-- 50 2439 1 2439
-- 60 2439 1 2439
-- 70 2439 1 2439
-- 80 1522 2 3961
-- 90 1522 2 3961
-- 100 1522 2 3961
-- 110 1522 2 3961
-- 120 1522 2 3961
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 2 sequences, total length 3983 bp (including 0 repeats of total length 0 bp).
-- bubbles: 1 sequences, total length 1770 bp.
-- unassembled: 19 sequences, total length 22971 bp.
--
-- Contig sizes based on genome size 3.2kbp:
--
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2499 1 2499
-- 20 2499 1 2499
-- 30 2499 1 2499
-- 40 2499 1 2499
-- 50 2499 1 2499
-- 60 2499 1 2499
-- 70 2499 1 2499
-- 80 1484 2 3983
-- 90 1484 2 3983
-- 100 1484 2 3983
-- 110 1484 2 3983
-- 120 1484 2 3983
--
Is there something in the report that speaks to you about what it could be? I will try the suggestions in the previous post in the mean time:-) Thank you.
I have a few naive questions trying to understand the report. Perhaps you have a link/ read.me that answers these somewhere, but I couldn't locate it (sorry).
Thank you.
Have you tried running the alternate parameter assemblies?
Bubbles are things contained in a contig, so the bubble is definitely redundant. By redundant, I meant the contigs may have similarity to each other and are assembling the same sequence. The *.contigs.layout.tigInfo
will give you stats on how many reads are in each contig and their coverage while *contigs.layout.readToTig
will tell you the position of every read in the assembly.
Thank you so much. Do you mean the suggested parameters in #1322? I will try them now.
After playing with different parameters and mapping the tigs to a reference we made I am happy that Canu has assembled my genome well. Thanks for your help!
I used the following command to create a draft assembly for amplified cDNA from viral RNA
/Linux-amd64/bin/canu -p HBV -d HBV-oxford contigFilter="2 0 1.0 0.5 0" genomeSize=3.2K -nanopore-raw /BC09.fastq
It worked quickly with no errors but the output contigs.fasta is broken into 3 tigs. The consensus.fasta.fai looks like this:
I guess this is because it has identified a gap or something, but is there a way to force canu to not break contigs? Let me know if theres anything else you need.
-- Canu snapshot v2.0-development +369 changes (r9862 6cfdadd772614d22446e9c113b8405b489dfe3e5)
Linux