Closed mortunco closed 7 years ago
Canu does output a message like:
-- Finished stage 'outputSequence', reset canuIteration.
--
-- Bye.
at the end of the run. There should be a canu.out file in the folder which will have this output when running on the grid.
There is also some information in the FAQ about how to improve assembly continuity and deal with different techs/genome characteristics. What kind of information would you like to see? The report provides only basic contiguity stats to see if something went wrong during the assembly (low coverage from correction, strange k-mer distributions) but it won't detect assembly errors or similar issues.
In your case, Canu is reporting only 0.05x coverage but based on everything else in the logs there is about 30-40x. There is also an assembly of almost 900mb in size so I'm going to guess the genome size was not set correctly for this run (I would guess it was set to 1 terabase instead of 1 gig). The report doesn't have any assembly stats because of this since the assembly is <1% of expected size.
Thank you for your fast response. I think my run is not correct because I entered a wrong genome size value.
What kind of information would you like to see? The report provides only basic contiguity stats to see if something went wrong during the assembly (low coverage from correction, strange k-mer distributions) but it won't detect assembly errors or similar issues.
I just wanted to know in the most simplest way that if my assembly is viable for the further analysis.
In your case, Canu is reporting only 0.05x coverage but based on everything else in the logs there is about 30-40x. There is also an assembly of almost 900mb in size so I'm going to guess the genome size was not set correctly for this run (I would guess it was set to 1 terabase instead of 1 gig). The report doesn't have any assembly stats because of this since the assembly is <1% of expected size.
You are right. I made a mistake while calculating genome size. For some reason, I estimated it based on the file size. I know this question is out of this issue but, do you think If I make my genome size 5g like you suggest, will I have higher coverage ? or do I have to run a software that estimates genome is a must ? (I just found Kmergenie software which is used for genome size estimation. But I am also open to your suggestions.)
Thank you very much for your help and patience,
Best regards,
Tunc.
This is the command line option which I obtained aforementioned results.
tmorova@lisa:~$ canu-1.6/Linux-amd64/bin/canu
-p kefal
-d kefal_genome/
genomeSize=555g
-pacbio-raw kefal_pacbio/pacbio/*/*/*.fastq
If you set the genome size to 5g then the stats reported would be more accurate, yes. I'm not sure it would change the assembly very much though. The genome size doesn't have to be exact, as long as it is in the right ballpark (say 6g instead of 5g is ok). You could also use GenomeScope to estimate genome size and diversity as well but both it and KmerGenie would work best given Illumina data not raw PacBio data.
Given that genome size, you only have about 5x of data (28664078334 / 5000000000
from the correction/asm.gkpStore log) which isn't enough to assemble the full genome. It looks like you assembled about 20% of the genome from the log:
-- contigs: 21987 sequences, total length 870033481 bp (including 1180 repeats of total length 12069127 bp).
So I don't think this assembly would be sufficient for downstream analysis since it is so incomplete. You could confirm this by using BUSCO which will look for single-copy universal genes in the assembly, presumably only 20% of them would be found if the genome size is accurate.
Dear skoren,
I have just finished my new run based on the new genome parameter and I waited to ask couple questions with the newest results. You are right about the 5x. Genome parameter produced the results as you expected. But can I improve this coverage ( am I doing something wrong againg so that I am having low coverage again? ) or it is what it is and there is nothing to do ?
Thank you very much for your time and patience.
Best regards,
Tunc.
This is my new run.report. Maybe it helps.
[CORRECTION/READS]
--
-- In gatekeeper store 'correction/kefal.gkpStore':
-- Found 3629764 reads.
-- Found 28664078334 bases (5.73 times coverage).
--
-- Read length histogram (one '*' equals 4667.25 reads):
-- 0 999 0
-- 1000 1999 293679 **************************************************************
-- 2000 2999 326708 **********************************************************************
-- 3000 3999 323552 *********************************************************************
-- 4000 4999 312992 *******************************************************************
-- 5000 5999 296839 ***************************************************************
-- 6000 6999 276515 ***********************************************************
-- 7000 7999 253165 ******************************************************
-- 8000 8999 231004 *************************************************
-- 9000 9999 213937 *********************************************
-- 10000 10999 200298 ******************************************
-- 11000 11999 183646 ***************************************
-- 12000 12999 156526 *********************************
-- 13000 13999 127205 ***************************
-- 14000 14999 100735 *********************
-- 15000 15999 78965 ****************
-- 16000 16999 60646 ************
-- 17000 17999 46516 *********
-- 18000 18999 35555 *******
-- 19000 19999 26589 *****
-- 20000 20999 20323 ****
-- 21000 21999 15401 ***
-- 22000 22999 11783 **
-- 23000 23999 8947 *
-- 24000 24999 6813 *
-- 25000 25999 5093 *
-- 26000 26999 4008
-- 27000 27999 3100
-- 28000 28999 2388
-- 29000 29999 1729
-- 30000 30999 1349
-- 31000 31999 928
-- 32000 32999 733
-- 33000 33999 597
-- 34000 34999 409
-- 35000 35999 319
-- 36000 36999 224
-- 37000 37999 174
-- 38000 38999 105
-- 39000 39999 81
-- 40000 40999 54
-- 41000 41999 48
-- 42000 42999 25
-- 43000 43999 21
-- 44000 44999 17
-- 45000 45999 10
-- 46000 46999 5
-- 47000 47999 4
-- 48000 48999 2
-- 49000 49999 1
-- 50000 50999 1
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 160168767 *************************** 0.0774 0.0056
-- 2- 2 198247238 ********************************** 0.1733 0.0195
-- 3- 4 379813678 ***************************************************************** 0.2695 0.0403
-- 5- 7 404226606 ********************************************************************** 0.4328 0.0930
-- 8- 11 310801954 ***************************************************** 0.5989 0.1758
-- 12- 16 210091867 ************************************ 0.7283 0.2724
-- 17- 22 136234173 *********************** 0.8181 0.3682
-- 23- 29 87399535 *************** 0.8776 0.4553
-- 30- 37 56227256 ********* 0.9165 0.5302
-- 38- 46 36651274 ****** 0.9419 0.5928
-- 47- 56 24404893 **** 0.9586 0.6443
-- 57- 67 16608676 ** 0.9698 0.6866
-- 68- 79 11544991 * 0.9775 0.7214
-- 80- 92 8191129 * 0.9829 0.7502
-- 93- 106 5938800 * 0.9867 0.7742
-- 107- 121 4406241 0.9895 0.7944
-- 122- 137 3339997 0.9916 0.8117
-- 138- 154 2574762 0.9932 0.8266
-- 155- 172 2010196 0.9944 0.8395
-- 173- 191 1587458 0.9953 0.8508
-- 192- 211 1263097 0.9961 0.8608
-- 212- 232 1016402 0.9967 0.8696
-- 233- 254 824168 0.9972 0.8774
-- 255- 277 672477 0.9976 0.8844
-- 278- 301 555904 0.9979 0.8906
-- 302- 326 462948 0.9982 0.8962
-- 327- 352 390165 0.9984 0.9012
-- 353- 379 331979 0.9986 0.9058
-- 380- 407 284818 0.9987 0.9100
-- 408- 436 246224 0.9989 0.9139
-- 437- 466 214181 0.9990 0.9175
-- 467- 497 187928 0.9991 0.9209
-- 498- 529 165070 0.9992 0.9241
-- 530- 562 145560 0.9993 0.9270
-- 563- 596 127895 0.9993 0.9298
-- 597- 631 113813 0.9994 0.9324
-- 632- 667 101065 0.9994 0.9348
-- 668- 704 89880 0.9995 0.9371
-- 705- 742 80328 0.9995 0.9392
-- 743- 781 71196 0.9996 0.9413
-- 782- 821 63666 0.9996 0.9432
--
-- 12082696 (max occurrences)
-- 28449463107 (total mers, non-unique)
-- 1908449297 (distinct mers, non-unique)
-- 160168767 (unique mers)
[CORRECTION/CORRECTIONS]
--
-- Reads to be corrected:
-- 3629515 reads longer than 0 bp
-- 28657761232 bp
-- Expected corrected reads:
-- 3629515 reads
-- 26419680288 bp
-- 0 bp minimum length
-- 7279 bp mean length
-- 18191 bp n50 length
[TRIMMING/READS]
--
-- In gatekeeper store 'trimming/kefal.gkpStore':
-- Found 3495244 reads.
-- Found 27246096664 bases (5.44 times coverage).
--
-- Read length histogram (one '*' equals 4574.48 reads):
-- 0 999 0
-- 1000 1999 261951 *********************************************************
-- 2000 2999 316304 *********************************************************************
-- 3000 3999 320214 **********************************************************************
-- 4000 4999 311883 ********************************************************************
-- 5000 5999 296203 ****************************************************************
-- 6000 6999 274963 ************************************************************
-- 7000 7999 250216 ******************************************************
-- 8000 8999 227890 *************************************************
-- 9000 9999 211232 **********************************************
-- 10000 10999 197312 *******************************************
-- 11000 11999 177356 **************************************
-- 12000 12999 147854 ********************************
-- 13000 13999 117345 *************************
-- 14000 14999 91712 ********************
-- 15000 15999 70756 ***************
-- 16000 16999 54106 ***********
-- 17000 17999 40890 ********
-- 18000 18999 31238 ******
-- 19000 19999 23413 *****
-- 20000 20999 17543 ***
-- 21000 21999 13261 **
-- 22000 22999 10120 **
-- 23000 23999 7644 *
-- 24000 24999 5808 *
-- 25000 25999 4353
-- 26000 26999 3375
-- 27000 27999 2586
-- 28000 28999 2005
-- 29000 29999 1448
-- 30000 30999 1115
-- 31000 31999 801
-- 32000 32999 653
-- 33000 33999 463
-- 34000 34999 351
-- 35000 35999 248
-- 36000 36999 185
-- 37000 37999 140
-- 38000 38999 88
-- 39000 39999 73
-- 40000 40999 41
-- 41000 41999 30
-- 42000 42999 27
-- 43000 43999 15
-- 44000 44999 13
-- 45000 45999 9
-- 46000 46999 3
-- 47000 47999 5
-- 48000 48999 1
-- 49000 49999 1
-- 50000 50999 1
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 12260626729 *******************************************************************--> 0.8896 0.4512
-- 2- 2 533775095 ********************************************************************** 0.9284 0.4905
-- 3- 4 284814077 ************************************* 0.9416 0.5107
-- 5- 7 185535500 ************************ 0.9543 0.5391
-- 8- 11 181107716 *********************** 0.9660 0.5804
-- 12- 16 166213114 ********************* 0.9785 0.6461
-- 17- 22 97505608 ************ 0.9894 0.7281
-- 23- 29 32661591 **** 0.9953 0.7883
-- 30- 37 11428041 * 0.9973 0.8141
-- 38- 46 6744985 0.9980 0.8270
-- 47- 56 4681131 0.9985 0.8370
-- 57- 67 3396372 0.9988 0.8456
-- 68- 79 2475548 0.9990 0.8531
-- 80- 92 1836334 0.9992 0.8597
-- 93- 106 1381945 0.9994 0.8654
-- 107- 121 1073182 0.9994 0.8703
-- 122- 137 854088 0.9995 0.8748
-- 138- 154 695443 0.9996 0.8788
-- 155- 172 572700 0.9996 0.8825
-- 173- 191 474726 0.9997 0.8859
-- 192- 211 403287 0.9997 0.8891
-- 212- 232 351721 0.9997 0.8920
-- 233- 254 308458 0.9998 0.8949
-- 255- 277 270487 0.9998 0.8976
-- 278- 301 233142 0.9998 0.9003
-- 302- 326 205025 0.9998 0.9027
-- 327- 352 182074 0.9998 0.9051
-- 353- 379 162066 0.9999 0.9074
-- 380- 407 145224 0.9999 0.9095
-- 408- 436 132609 0.9999 0.9116
-- 437- 466 120070 0.9999 0.9137
-- 467- 497 110231 0.9999 0.9157
-- 498- 529 100708 0.9999 0.9176
-- 530- 562 91565 0.9999 0.9195
-- 563- 596 83045 0.9999 0.9214
-- 597- 631 74053 0.9999 0.9231
-- 632- 667 66574 0.9999 0.9248
-- 668- 704 60236 0.9999 0.9264
-- 705- 742 55556 0.9999 0.9279
-- 743- 781 50255 0.9999 0.9294
-- 782- 821 46485 0.9999 0.9308
--
-- 4636251 (max occurrences)
-- 14912069811 (total mers, non-unique)
-- 1521244268 (distinct mers, non-unique)
-- 12260626729 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.0450 (use overlaps at or below this fraction error)
-- 1 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 3495244 reads 27246096664 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 1906859 reads 12247680932 bases (trimmed reads output)
-- 11611 reads 88262599 bases (reads with no change, kept as is)
-- 1415652 reads 8360582403 bases (reads with no overlaps, deleted)
-- 161122 reads 1066902964 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 1795449 reads 3300784545 bases (bases trimmed from the 5' end of a read)
-- 1843927 reads 2181883221 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.0450 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 1918470 reads 17818611297 bases (reads processed)
-- 1576774 reads 9427485367 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 2099 reads 20164371 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 1916371 reads 17798446926 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 1612 reads 1653 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 1653 reads 624382 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 804 reads 2581563 bases (trimmed from the 5' end of the read)
-- 808 reads 2563806 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In gatekeeper store 'unitigging/kefal.gkpStore':
-- Found 1918464 reads.
-- Found 12330792807 bases (2.46 times coverage).
--
-- Read length histogram (one '*' equals 3426.61 reads):
-- 0 999 0
-- 1000 1999 239863 **********************************************************************
-- 2000 2999 205272 ***********************************************************
-- 3000 3999 199874 **********************************************************
-- 4000 4999 191229 *******************************************************
-- 5000 5999 178933 ****************************************************
-- 6000 6999 162014 ***********************************************
-- 7000 7999 139561 ****************************************
-- 8000 8999 122586 ***********************************
-- 9000 9999 109484 *******************************
-- 10000 10999 97025 ****************************
-- 11000 11999 80311 ***********************
-- 12000 12999 59992 *****************
-- 13000 13999 41738 ************
-- 14000 14999 29223 ********
-- 15000 15999 20174 *****
-- 16000 16999 13476 ***
-- 17000 17999 9085 **
-- 18000 18999 6126 *
-- 19000 19999 4030 *
-- 20000 20999 2659
-- 21000 21999 1899
-- 22000 22999 1256
-- 23000 23999 814
-- 24000 24999 625
-- 25000 25999 400
-- 26000 26999 260
-- 27000 27999 177
-- 28000 28999 112
-- 29000 29999 88
-- 30000 30999 73
-- 31000 31999 38
-- 32000 32999 19
-- 33000 33999 22
-- 34000 34999 13
-- 35000 35999 6
-- 36000 36999 3
-- 37000 37999 2
-- 38000 38999 0
-- 39000 39999 1
-- 40000 40999 1
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 1457015698 *******************************************************************--> 0.6040 0.1185
-- 2- 2 222392301 ********************************************************************** 0.6962 0.1547
-- 3- 4 166192731 **************************************************** 0.7377 0.1791
-- 5- 7 143889575 ********************************************* 0.7872 0.2223
-- 8- 11 154736318 ************************************************ 0.8420 0.2972
-- 12- 16 139299333 ******************************************* 0.9028 0.4216
-- 17- 22 76830942 ************************ 0.9546 0.5717
-- 23- 29 23830374 ******* 0.9810 0.6751
-- 30- 37 8080124 ** 0.9890 0.7163
-- 38- 46 4807446 * 0.9920 0.7365
-- 47- 56 3325156 * 0.9939 0.7522
-- 57- 67 2378775 0.9953 0.7658
-- 68- 79 1700300 0.9962 0.7774
-- 80- 92 1246263 0.9969 0.7873
-- 93- 106 943593 0.9974 0.7958
-- 107- 121 733334 0.9978 0.8033
-- 122- 137 587348 0.9981 0.8100
-- 138- 154 481902 0.9983 0.8162
-- 155- 172 396668 0.9985 0.8218
-- 173- 191 340978 0.9987 0.8271
-- 192- 211 296269 0.9988 0.8321
-- 212- 232 255127 0.9989 0.8369
-- 233- 254 216020 0.9990 0.8415
-- 255- 277 188654 0.9991 0.8457
-- 278- 301 167732 0.9992 0.8498
-- 302- 326 148592 0.9993 0.8537
-- 327- 352 133390 0.9993 0.8575
-- 353- 379 121586 0.9994 0.8612
-- 380- 407 111504 0.9994 0.8648
-- 408- 436 99469 0.9995 0.8683
-- 437- 466 89500 0.9995 0.8717
-- 467- 497 82023 0.9996 0.8750
-- 498- 529 73239 0.9996 0.8782
-- 530- 562 64692 0.9996 0.8813
-- 563- 596 58486 0.9996 0.8842
-- 597- 631 52828 0.9997 0.8869
-- 632- 667 48049 0.9997 0.8895
-- 668- 704 43956 0.9997 0.8921
-- 705- 742 40625 0.9997 0.8945
-- 743- 781 37285 0.9997 0.8969
-- 782- 821 34685 0.9998 0.8992
--
-- 2777938 (max occurrences)
-- 10833489365 (total mers, non-unique)
-- 955101113 (distinct mers, non-unique)
-- 1457015698 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 4681 0.24 7922.26 +- 4270.65 696.52 +- 788.02 (bad trimming)
-- middle-hump 5368 0.28 5111.90 +- 3307.65 426.57 +- 713.32 (bad trimming)
-- no-5-prime 31096 1.62 7239.44 +- 4261.69 302.69 +- 600.04 (bad trimming)
-- no-3-prime 31264 1.63 7260.04 +- 4257.05 300.99 +- 595.48 (bad trimming)
--
-- low-coverage 376134 19.61 3568.38 +- 2532.20 4.34 +- 1.98 (easy to assemble, potential for lower quality consensus)
-- unique 620466 32.34 5897.27 +- 3586.42 17.86 +- 5.63 (easy to assemble, perfect, yay)
-- repeat-cont 41696 2.17 5299.95 +- 3253.08 430.77 +- 406.55 (potential for consensus errors, no impact on assembly)
-- repeat-dove 232 0.01 13591.04 +- 5956.38 298.90 +- 282.08 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 239201 12.47 9013.93 +- 4131.67 3648.01 +- 3264.46 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 416219 21.70 6674.32 +- 3067.54 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 136186 7.10 11790.83 +- 3714.03 (will end contigs, potential to misassemble)
-- uniq-anchor 2765 0.14 9217.72 +- 3480.68 2978.97 +- 2902.69 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 21977 sequences, total length 873673405 bp (including 1152 repeats of total length 12175083 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 609194 sequences, total length 3289799996 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 50294 5474 500047085
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 21977 sequences, total length 872069112 bp (including 1152 repeats of total length 12152210 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 609194 sequences, total length 3289508348 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 50099 5492 500025153
—
The genomeSize parameter is used only for determining coverage in input reads and reporting of statistics. The coverage reported is just bases_in_input_reads / genome_size_parameter.
The unitigging overlap report is claiming 17x (+-5x) coverage in reads that look to be from unique portions of the genome. The kmer report in the same section is showing a slight peak at about 10x, but this is usually skewed low by noisy reads, and the big peak at low copy number shows this is pretty noisy data.
The unitigging gatekeeper report is showing 12,330,792,807 bases in input reads, but the consensus report says about 3.3 Gbp of those remain 'unassembled'. So that leaves about 9 Gbp in input bases that assembled to about 0.9 Gbp, so around 10x in assembled coverage.
You can try the low coverage settings, but it looks like you might need more input coverage for any better assembly.
I would suggest running the unitigging kmer histogram (unitigging/0-*/*.histogram
) through something like GenomeScope (http://qb.cshl.edu/genomescope/) to see whether it is able to predict a genome size and heterozygosity. That will give you a better idea of if the 1gb you've assembled is a small part of your genome or not. However, you probably do need more coverage to improve the assembly result.
@skoren We also have Illumina data for the same sample and as you suggested I used illumina data in KmerGenie software. The estimated genome size was ~1 Giga bp (892293194 bp).
Regarding @brianwalenz comment, I understood that genome parameter is just used for simple calculations so I cannot really change the output. But when I found this 1/5 times smaller value of the genome parameter. Do you think it is worth to give a shot with the new value ? because if the algorithm mapped 1gb that its actually the whole genome ?
Thank you both of you for your patience and time to help me problem. I owe a lot !
Best regards,
T.
So if your genome size is 1gb, that would imply the Canu assembly you have is almost the complete genome (873 Mbp) with an NG50 of 50kb. This would also be consistent with the slight peak at 10x in the corrected data, 12330792807 / 10 = 1.2 Gbp. Any reason you are setting the genome size to 5 if the estimate was 1? Another way to test this is to use BUSCO (http://busco.ezlab.org) to see how complete the marker genes for your assembly are. If the genome is close to 1 Gbp, the gene set will be largely complete, if it is 5 Gb, it should be < 20% complete.
As @brianwalenz said, the genome size is just used for basic calculations. You can see your assembly with a genome size of 555 Gbp and 5 Gbp produced similar assemblies (870 and 873 mb total). It would just give you more meaningful stats in the report (e.g. you don't have 2x, you have 10x coverage) but isn't worth rerunning with it set to 1 from 5. Given a 1gb genome, you have an input of 25x which is relatively low and your average input read length is <8kb, also not very high. This probably means if you want to improve your assembly contiguity you'd need more/longer reads.
@skoren I did not know anything about estimating genome size (I thought fastq sizes could be used for an approximation, but I have deeply mistaken. Sorry!) before hand and I can say that I learned through questions.
I will definetely give busco a try.
Thank you very much again for your help!
Best, T.
Dear Authors hi,
As an ignorant visiting scholar about denovo assembly generation, I couldnt find detailed information in the canu wiki about, simply if my run was successfully ended or I need to tweak some parameters and retry it.
In previously asked issue, #289, I found that bad trimming flags are not necessarily a bad call based on the organism.
Before I ask questions I have a two small comments about canu status messages and results interpretation.
1) To determine if the run ended or aborted due to an error, could you add like a line for example "process ended successfully (implying run is ended but the quality or results need to be interpreted)
2) Would you be able to add a section about interpreting assembly quaility for new people in the subject. ( like me )
In the #289 couple histograms were asked so I just pasted the assembly report. I am sorry if this question is too general but any of your comments could be useful for me to tweak couple parameters. I am aware that there is not a ultimate/perfect results but I am planning to do RNAseq ,mutation calling etc next therefore I wanna be sure about the assembly before stuff gets too complicated. Please tell me if you need other output files.
Thank you very much for your time,
Tunc.