Closed harsh-shukla closed 6 years ago
30x is on the low end of coverage so for a mammal you'll likely be limited to 1-2mb NG50s (e.g. humans) unless you have very long (100kb+) reads. In your case, setting corMhapSensitivity=normal is actually lower sensitivity than the default for 30x, so that hurt the correction. I'll clarify the FAQ to note that you only need to set that parameter if you have more than 50x. There are a couple of options you can adjust. I'd suggest trying:
corMinCoverage=0 corMhapSensitivity=high correctedErrorRate=0.105
if you want to add nanopore data, specify it as pacbio data (since the majority of your data is pacbio) and increase correctedErrorRate=0.12
.
Hi Sergey,
Thank you so much for the quick reply. We are planning to further scaffold the draft using Hi-C (or Bionano) and I need at least 1Mb of NG50 for it to work further. As long as I get a minimum NG50 of 1Mb I am more or less Ok.
So finally I have ~33X of Pacbio and ~7X of nanopore (R9 1D ). Nanopore data is not very great and the read length distribution is equivalent to Pacbio (actually a little worse) and I don't know what will it do to my assembly. When you say that I should give the nanopore data as pacbio data do you mean while running the correction I should specify it like this
-pacbio-raw <PACBIO_DATA> & <NANOPORE_DATA> (combined files together)
If yes , Do I have to change the rawErrorRate to somewhere in between 0.3 and 0.5 or let it be 0.3?
Regards, Harsh
Yes, that is what I meant. Leave the rawErrorRate, you should have enough pacbio data to get a good consensus with the default of 0.3.
Hey Sergey,
Thank you so much for the suggestions. I'll try running the Pacbio-only assembly and hybrid assembly with modified parameters.
I'll post here as soon as I have the stats of the new runs.
With Regards, Harsh
Hey Sergey ,
Hi again. So I am running out of disk space very rapidly and the admin is not happy at all. Referencing this issue #1039 because of low depth and higher sensitivity my Overlap Store building is increasing like crazy in size.
The current size of 1-overlapper/results/ is 2.8 TB.
I am currently running the bucketizer . My question is once a particular bucket is created can I delete the corresponding .ovb and .counts file from 1-overlapper/results/. For example once bucket0001/ folder is created (the corresponding job is done) can I delete 000001.counts and 000001.ovb ? Also should I delete the 1-overlapper/blocks/ folder now itself.
Can I bucket and sort few at a time and delete the .ovb file once it is sorted? Is there any way to do that ?
Regards, Harsh
It is definitely safe to erase the blocks folder in 1-overlapper directory. It is mostly safe to erase the results files, yes. However, if you have any disk corruption during the store construction you wouldn't be able to re-run a bucketizing step which is why Canu usually doesn't erase these files until the end of store building.
Hey Sergey,
Thanks for the quick answer.
One more thing can I shift the entire assembly (whole folder) from one SGE cluster to another and continue the run. The sys-admin has agreed to mount a hard drive on an another SGE cluster. They have the same version of canu build (from source) (Canu 1.7.1) It should be fine I guess
Regards, Harsh
The assembly uses all local paths so moving to a different HD/folder is OK.
Hi again Sergey,
So the Pacbio only run got over today. I am getting a way better assembly now. NG50 is almost ~1MB The parameters used were as suggested
corMinCoverage=0 corMhapSensitivity=high correctedErrorRate=0.105
But it seems the Error Rate for the corrected reads is quite high.
unitigging/4-*/001thr000.num000.log file
INITIAL EDGES
-------- ----------------------------------------
5881886 reads are contained
5030208 reads have no best edges (singleton)
16710 reads have only one best edge (spur)
12550 are mutual best
561401 reads have two best edges
5728 have one mutual best edge
553503 have two mutual best edges
ERROR RATES (9578612 samples)
-----------
mean 0.03894592 stddev 0.01776539 -> 0.14553824 fraction error = 14.553824% error
median 0.03670000 mad 0.01200000 -> 0.14344720 fraction error = 14.344720% error
EDGE FILTERING
-------- ------------------------------------------
5035130 reads have a suspicious overlap pattern
0 reads had edges filtered
0 had one
0 had two
7104 reads have length incompatible edges
5858 have one
1246 have two
FINAL EDGES
-------- ----------------------------------------
5881886 reads are contained
5035232 reads have no best edges (singleton)
24280 reads have only one best edge (spur)
9231 are mutual best
548807 reads have two best edges
1749 have one mutual best edge
541937 have two mutual best edges
Also genome scope output from both trimming and unitigging step
From trimming/0-merscounts
GenomeScope version 1.0
k = 22
property min max
Heterozygosity 1.88872% 2.05744%
Genome Haploid Length 1,890,660,623 bp 1,909,029,015 bp
Genome Repeat Length 253,531,721 bp 255,994,865 bp
Genome Unique Length 1,637,128,902 bp 1,653,034,150 bp
Model Fit 94.4821% 95.6423%
Read Error Rate 3.67823% 3.67823%
----------------------------------------------------------------------
From unitigging/0-mercounts
GenomeScope version 1.0
k = 22
property min max
Heterozygosity 1.91341% 2.07094%
Genome Haploid Length 1,785,642,436 bp 1,801,601,988 bp
Genome Repeat Length 198,541,059 bp 200,315,561 bp
Genome Unique Length 1,587,101,377 bp 1,601,286,428 bp
Model Fit 93.9694% 94.9344%
Read Error Rate 1.68581% 1.68581%
Should I run trimming and assembly step again with increased correctedErrorRate (~0.14) or do one more round of correction? Does increasing the correctedErrorRate correlate with having more chances of mis-assemblies?
Trimming step reduces the coverage to ~19.5X. Will it be enough to cover the entire genome ?
Also attached is the .report file
[CORRECTION/READS]
--
-- In gatekeeper store './Mammalian_Pacbio.gkpStore':
-- Found 11490205 reads.
-- Found 78182336134 bases (32.57 times coverage).
--
-- Read length histogram (one '*' equals 27920.97 reads):
-- 0 999 0
-- 1000 1999 1954468 **********************************************************************
-- 2000 2999 1742867 **************************************************************
-- 3000 3999 1124831 ****************************************
-- 4000 4999 943905 *********************************
-- 5000 5999 796367 ****************************
-- 6000 6999 683089 ************************
-- 7000 7999 592761 *********************
-- 8000 8999 514032 ******************
-- 9000 9999 447533 ****************
-- 10000 10999 390795 *************
-- 11000 11999 342092 ************
-- 12000 12999 304283 **********
-- 13000 13999 280427 **********
-- 14000 14999 260830 *********
-- 15000 15999 222907 *******
-- 16000 16999 180278 ******
-- 17000 17999 142807 *****
-- 18000 18999 113807 ****
-- 19000 19999 90561 ***
-- 20000 20999 72916 **
-- 21000 21999 57992 **
-- 22000 22999 45916 *
-- 23000 23999 36962 *
-- 24000 24999 29471 *
-- 25000 25999 23644
-- 26000 26999 19142
-- 27000 27999 15273
-- 28000 28999 12253
-- 29000 29999 9603
-- 30000 30999 8006
-- 31000 31999 6266
-- 32000 32999 4961
-- 33000 33999 4009
-- 34000 34999 3172
-- 35000 35999 2457
-- 36000 36999 2093
-- 37000 37999 1609
-- 38000 38999 1151
-- 39000 39999 1008
-- 40000 40999 748
-- 41000 41999 546
-- 42000 42999 491
-- 43000 43999 375
-- 44000 44999 303
-- 45000 45999 237
-- 46000 46999 204
-- 47000 47999 153
-- 48000 48999 119
-- 49000 49999 90
-- 50000 50999 76
-- 51000 51999 65
-- 52000 52999 36
-- 53000 53999 47
-- 54000 54999 39
-- 55000 55999 21
-- 56000 56999 16
-- 57000 57999 17
-- 58000 58999 15
-- 59000 59999 9
-- 60000 60999 9
-- 61000 61999 7
-- 62000 62999 5
-- 63000 63999 4
-- 64000 64999 6
-- 65000 65999 0
-- 66000 66999 2
-- 67000 67999 2
-- 68000 68999 0
-- 69000 69999 4
-- 70000 70999 2
-- 71000 71999 1
-- 72000 72999 3
-- 73000 73999 1
-- 74000 74999 1
-- 75000 75999 0
-- 76000 76999 3
-- 77000 77999 0
-- 78000 78999 1
-- 79000 79999 0
-- 80000 80999 1
-- 81000 81999 0
-- 82000 82999 2
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 18717930 **** 0.0087 0.0002
-- 2- 2 35006604 ******** 0.0251 0.0011
-- 3- 4 110744625 *************************** 0.0484 0.0031
-- 5- 7 211775545 **************************************************** 0.1084 0.0105
-- 8- 11 279430070 ******************************************************************** 0.2095 0.0300
-- 12- 16 284235143 ********************************************************************** 0.3358 0.0662
-- 17- 22 242137163 *********************************************************** 0.4608 0.1172
-- 23- 29 199663035 ************************************************* 0.5672 0.1766
-- 30- 37 169596963 ***************************************** 0.6563 0.2423
-- 38- 46 142408818 *********************************** 0.7327 0.3143
-- 47- 56 114018159 **************************** 0.7970 0.3897
-- 57- 67 86731089 ********************* 0.8484 0.4634
-- 68- 79 63641811 *************** 0.8875 0.5307
-- 80- 92 45944253 *********** 0.9162 0.5891
-- 93- 106 33110321 ******** 0.9370 0.6385
-- 107- 121 24008875 ***** 0.9520 0.6798
-- 122- 137 17574997 **** 0.9630 0.7142
-- 138- 154 13027826 *** 0.9710 0.7428
-- 155- 172 9791825 ** 0.9769 0.7668
-- 173- 191 7471723 * 0.9814 0.7870
-- 192- 211 5788827 * 0.9849 0.8042
-- 212- 232 4535908 * 0.9875 0.8189
-- 233- 254 3589193 0.9896 0.8317
-- 255- 277 2869251 0.9913 0.8428
-- 278- 301 2307350 0.9926 0.8525
-- 302- 326 1870718 0.9937 0.8610
-- 327- 352 1531909 0.9945 0.8684
-- 353- 379 1267164 0.9952 0.8750
-- 380- 407 1059297 0.9958 0.8810
-- 408- 436 895836 0.9963 0.8863
-- 437- 466 764283 0.9967 0.8911
-- 467- 497 657849 0.9971 0.8955
-- 498- 529 570953 0.9974 0.8995
-- 530- 562 499040 0.9976 0.9033
-- 563- 596 438403 0.9979 0.9068
-- 597- 631 385738 0.9981 0.9100
-- 632- 667 340814 0.9983 0.9130
-- 668- 704 300442 0.9984 0.9158
-- 705- 742 264660 0.9986 0.9185
-- 743- 781 233762 0.9987 0.9209
-- 782- 821 207489 0.9988 0.9232
--
-- 41247321 (max occurrences)
-- 77991265129 (total mers, non-unique)
-- 2123085280 (distinct mers, non-unique)
-- 18717930 (unique mers)
[CORRECTION/LAYOUT]
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 11436831 53374
-- Number of Bases 77921492781 260591354
-- Coverage 32.467 0.109
-- Median 4986 1784
-- Mean 6813 4882
-- N50 10464 10484
-- Minimum 1000 0
-- Maximum 82685 82287
--
-- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 11475662 11436831 11436831 0 0
-- Number of Bases 78078052338 77921492781 73711028550 0 0
-- Coverage 32.533 32.467 30.713 0.000 0.000
-- Median 4973 4986 4555 0 0
-- Mean 6803 6813 6445 0 0
-- N50 10465 10464 10457 0 0
-- Minimum 1000 1000 1 0 0
-- Maximum 82685 82685 82684 0 0
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 53374 53374
-- Number of Bases 260591354 0
-- Coverage 0.109 0.000
-- Median 1784 0
-- Mean 4882 0
-- N50 10484 0
-- Minimum 0 0
-- Maximum 82287 0
--
-- Maximum Memory 1564656452
[TRIMMING/READS]
--
-- In gatekeeper store './Mammalian_Pacbio.gkpStore':
-- Found 11436065 reads.
-- Found 76371559322 bases (31.82 times coverage).
--
-- Read length histogram (one '*' equals 31545.25 reads):
-- 0 999 6124
-- 1000 1999 2208168 **********************************************************************
-- 2000 2999 1480468 **********************************************
-- 3000 3999 1143606 ************************************
-- 4000 4999 957214 ******************************
-- 5000 5999 807647 *************************
-- 6000 6999 690896 *********************
-- 7000 7999 596407 ******************
-- 8000 8999 514573 ****************
-- 9000 9999 446643 **************
-- 10000 10999 386965 ************
-- 11000 11999 338600 **********
-- 12000 12999 300727 *********
-- 13000 13999 279649 ********
-- 14000 14999 255368 ********
-- 15000 15999 211644 ******
-- 16000 16999 167363 *****
-- 17000 17999 131919 ****
-- 18000 18999 103875 ***
-- 19000 19999 83435 **
-- 20000 20999 66434 **
-- 21000 21999 52610 *
-- 22000 22999 41688 *
-- 23000 23999 33282 *
-- 24000 24999 26462
-- 25000 25999 21151
-- 26000 26999 17082
-- 27000 27999 13600
-- 28000 28999 10662
-- 29000 29999 8597
-- 30000 30999 6929
-- 31000 31999 5518
-- 32000 32999 4260
-- 33000 33999 3478
-- 34000 34999 2661
-- 35000 35999 2230
-- 36000 36999 1811
-- 37000 37999 1295
-- 38000 38999 1022
-- 39000 39999 846
-- 40000 40999 631
-- 41000 41999 489
-- 42000 42999 396
-- 43000 43999 332
-- 44000 44999 274
-- 45000 45999 205
-- 46000 46999 173
-- 47000 47999 118
-- 48000 48999 119
-- 49000 49999 74
-- 50000 50999 72
-- 51000 51999 53
-- 52000 52999 31
-- 53000 53999 34
-- 54000 54999 33
-- 55000 55999 22
-- 56000 56999 12
-- 57000 57999 18
-- 58000 58999 12
-- 59000 59999 9
-- 60000 60999 10
-- 61000 61999 6
-- 62000 62999 5
-- 63000 63999 2
-- 64000 64999 4
-- 65000 65999 1
-- 66000 66999 1
-- 67000 67999 3
-- 68000 68999 1
-- 69000 69999 2
-- 70000 70999 2
-- 71000 71999 2
-- 72000 72999 2
-- 73000 73999 1
-- 74000 74999 1
-- 75000 75999 0
-- 76000 76999 3
-- 77000 77999 0
-- 78000 78999 1
-- 79000 79999 0
-- 80000 80999 1
-- 81000 81999 1
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 34540413185 *******************************************************************--> 0.8788 0.4537
-- 2- 2 1736029254 ********************************************************************** 0.9230 0.4993
-- 3- 4 834586412 ********************************* 0.9369 0.5209
-- 5- 7 486650119 ******************* 0.9491 0.5486
-- 8- 11 511641343 ******************** 0.9599 0.5874
-- 12- 16 558145949 ********************** 0.9727 0.6565
-- 17- 22 407568324 **************** 0.9861 0.7593
-- 23- 29 159227228 ****** 0.9951 0.8532
-- 30- 37 36142600 * 0.9984 0.8975
-- 38- 46 10956320 0.9992 0.9105
-- 47- 56 5775306 0.9994 0.9160
-- 57- 67 3625218 0.9996 0.9197
-- 68- 79 2526916 0.9997 0.9226
-- 80- 92 1880777 0.9997 0.9249
-- 93- 106 1429240 0.9998 0.9270
-- 107- 121 1113387 0.9998 0.9289
-- 122- 137 890347 0.9998 0.9305
-- 138- 154 717895 0.9998 0.9320
-- 155- 172 591125 0.9999 0.9334
-- 173- 191 499011 0.9999 0.9346
-- 192- 211 426924 0.9999 0.9358
-- 212- 232 360919 0.9999 0.9369
-- 233- 254 314198 0.9999 0.9380
-- 255- 277 278484 0.9999 0.9390
-- 278- 301 258414 0.9999 0.9399
-- 302- 326 244988 0.9999 0.9409
-- 327- 352 228070 0.9999 0.9419
-- 353- 379 202048 0.9999 0.9429
-- 380- 407 175623 1.0000 0.9439
-- 408- 436 150391 1.0000 0.9448
-- 437- 466 126848 1.0000 0.9456
-- 467- 497 103873 1.0000 0.9464
-- 498- 529 88723 1.0000 0.9470
-- 530- 562 76500 1.0000 0.9476
-- 563- 596 68606 1.0000 0.9482
-- 597- 631 61966 1.0000 0.9487
-- 632- 667 57131 1.0000 0.9492
-- 668- 704 53044 1.0000 0.9497
-- 705- 742 49517 1.0000 0.9502
-- 743- 781 45892 1.0000 0.9506
-- 782- 821 42050 1.0000 0.9511
--
-- 14412852 (max occurrences)
-- 41590988772 (total mers, non-unique)
-- 4764133149 (distinct mers, non-unique)
-- 34540413185 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1050 (use overlaps at or below this fraction error)
-- 1 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 11490205 reads 76371559322 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 6086089 reads 45271880153 bases (trimmed reads output)
-- 385372 reads 1120547846 bases (reads with no change, kept as is)
-- 4718311 reads 20065105740 bases (reads with no overlaps, deleted)
-- 300433 reads 1237223274 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 5525550 reads 4882431543 bases (bases trimmed from the 5' end of a read)
-- 5741203 reads 3794370766 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1050 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 6471461 reads 55069230308 bases (reads processed)
-- 5018744 reads 21302329014 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 313 reads 2668318 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 6471148 reads 55066561990 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 10964 reads 11054 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 11054 reads 3057332 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 5652 reads 22580085 bases (trimmed from the 5' end of the read)
-- 5312 reads 20270473 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In gatekeeper store './Mammalian_Pacbio.gkpStore':
-- Found 6471405 reads.
-- Found 46349528979 bases (19.31 times coverage).
--
-- Read length histogram (one '*' equals 16408.67 reads):
-- 0 999 0
-- 1000 1999 1148607 **********************************************************************
-- 2000 2999 618052 *************************************
-- 3000 3999 579389 ***********************************
-- 4000 4999 530249 ********************************
-- 5000 5999 478191 *****************************
-- 6000 6999 429045 **************************
-- 7000 7999 381771 ***********************
-- 8000 8999 337290 ********************
-- 9000 9999 295867 ******************
-- 10000 10999 259201 ***************
-- 11000 11999 230005 **************
-- 12000 12999 207897 ************
-- 13000 13999 195370 ***********
-- 14000 14999 173163 **********
-- 15000 15999 139692 ********
-- 16000 16999 106313 ******
-- 17000 17999 80866 ****
-- 18000 18999 62883 ***
-- 19000 19999 49095 **
-- 20000 20999 37911 **
-- 21000 21999 29614 *
-- 22000 22999 22874 *
-- 23000 23999 17684 *
-- 24000 24999 13658
-- 25000 25999 10832
-- 26000 26999 8240
-- 27000 27999 6368
-- 28000 28999 4963
-- 29000 29999 3923
-- 30000 30999 2969
-- 31000 31999 2284
-- 32000 32999 1711
-- 33000 33999 1368
-- 34000 34999 989
-- 35000 35999 770
-- 36000 36999 569
-- 37000 37999 443
-- 38000 38999 296
-- 39000 39999 244
-- 40000 40999 171
-- 41000 41999 137
-- 42000 42999 113
-- 43000 43999 82
-- 44000 44999 78
-- 45000 45999 36
-- 46000 46999 28
-- 47000 47999 29
-- 48000 48999 17
-- 49000 49999 21
-- 50000 50999 14
-- 51000 51999 9
-- 52000 52999 2
-- 53000 53999 3
-- 54000 54999 3
-- 55000 55999 1
-- 56000 56999 2
-- 57000 57999 1
-- 58000 58999 0
-- 59000 59999 0
-- 60000 60999 1
-- 61000 61999 0
-- 62000 62999 0
-- 63000 63999 0
-- 64000 64999 0
-- 65000 65999 0
-- 66000 66999 0
-- 67000 67999 0
-- 68000 68999 1
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 9303977321 *******************************************************************--> 0.7091 0.2013
-- 2- 2 1103080911 ********************************************************************** 0.7932 0.2491
-- 3- 4 652465959 ***************************************** 0.8250 0.2762
-- 5- 7 441262704 **************************** 0.8557 0.3148
-- 8- 11 495849652 ******************************* 0.8861 0.3748
-- 12- 16 539549626 ********************************** 0.9234 0.4856
-- 17- 22 383623386 ************************ 0.9620 0.6486
-- 23- 29 143405603 ********* 0.9874 0.7931
-- 30- 37 30688049 * 0.9962 0.8583
-- 38- 46 8944266 0.9980 0.8763
-- 47- 56 4585097 0.9987 0.8837
-- 57- 67 2823931 0.9990 0.8885
-- 68- 79 1951045 0.9992 0.8921
-- 80- 92 1455250 0.9993 0.8952
-- 93- 106 1104407 0.9994 0.8978
-- 107- 121 861315 0.9995 0.9002
-- 122- 137 689514 0.9996 0.9022
-- 138- 154 555965 0.9996 0.9042
-- 155- 172 459241 0.9997 0.9059
-- 173- 191 393055 0.9997 0.9075
-- 192- 211 335654 0.9997 0.9090
-- 212- 232 284798 0.9998 0.9105
-- 233- 254 251336 0.9998 0.9119
-- 255- 277 226470 0.9998 0.9132
-- 278- 301 218647 0.9998 0.9145
-- 302- 326 210327 0.9998 0.9158
-- 327- 352 191030 0.9999 0.9173
-- 353- 379 167893 0.9999 0.9187
-- 380- 407 144379 0.9999 0.9200
-- 408- 436 122588 0.9999 0.9212
-- 437- 466 100208 0.9999 0.9223
-- 467- 497 81439 0.9999 0.9233
-- 498- 529 69683 0.9999 0.9241
-- 530- 562 61274 0.9999 0.9249
-- 563- 596 55148 0.9999 0.9256
-- 597- 631 50220 0.9999 0.9263
-- 632- 667 46584 0.9999 0.9270
-- 668- 704 42830 0.9999 0.9276
-- 705- 742 39435 0.9999 0.9283
-- 743- 781 36679 0.9999 0.9289
-- 782- 821 33593 1.0000 0.9295
--
-- 5392091 (max occurrences)
-- 36909652153 (total mers, non-unique)
-- 3817122533 (distinct mers, non-unique)
-- 9303977321 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 686 0.01 8406.37 +- 5405.71 432.12 +- 567.36 (bad trimming)
-- middle-hump 426 0.01 4605.99 +- 3689.24 342.64 +- 602.39 (bad trimming)
-- no-5-prime 4801 0.07 9663.32 +- 5720.04 137.64 +- 369.39 (bad trimming)
-- no-3-prime 5277 0.08 9961.37 +- 5688.19 124.12 +- 356.42 (bad trimming)
--
-- low-coverage 80372 1.24 2384.70 +- 1592.21 4.22 +- 1.41 (easy to assemble, potential for lower quality consensus)
-- unique 4305320 66.53 7031.51 +- 4933.27 18.80 +- 4.96 (easy to assemble, perfect, yay)
-- repeat-cont 670091 10.35 2934.34 +- 3029.78 1910.17 +- 2444.87 (potential for consensus errors, no impact on assembly)
-- repeat-dove 1278 0.02 20670.38 +- 8747.33 383.43 +- 850.28 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 770822 11.91 10992.10 +- 6185.89 2775.83 +- 3184.81 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 515450 7.97 7130.03 +- 4317.42 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 96884 1.50 15585.48 +- 5598.40 (will end contigs, potential to misassemble)
-- uniq-anchor 11097 0.17 9962.44 +- 5382.08 3552.85 +- 4105.11 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 6632 sequences, total length 2379969862 bp (including 270 repeats of total length 4432054 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 1401240 sequences, total length 6010030445 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2801520 63 240924077
-- 20 2089702 162 481245252
-- 30 1500254 301 721287251
-- 40 1198112 481 960368515
-- 50 923668 712 1200715387
-- 60 705818 1007 1440136006
-- 70 530581 1397 1680235960
-- 80 357585 1946 1920280140
-- 90 192896 2842 2160006989
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 6632 sequences, total length 2381351738 bp (including 270 repeats of total length 4428786 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 1401240 sequences, total length 6010008168 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2804872 63 241152016
-- 20 2090941 162 481683023
-- 30 1504549 300 720413349
-- 40 1199942 481 961175766
-- 50 924879 711 1200771306
-- 60 706762 1006 1440558211
-- 70 533202 1395 1680444157
-- 80 359672 1942 1920238374
-- 90 195117 2835 2160081719
--
Also regarding hybrid run (using 33X Pacbio and 7X Nanopore) currently my correction step is just now completed. Looking from the data above should i change correctedErrorRate=0.12 to something higher ?
Thanking You,
With Regards and Your's Sincerely, Harsh
For 33X, I think you're doing ok with a 1MB NG50. You end up with 15x of total bases after trimming which is low but probably the minimum to get a decent assembly. For the hybrid, I'd say using 0.12 is OK, it should hopefully leave you with 20x+ coverage after trimming which would improve the assembly.
Thanks sergey for all the help , I'll post the stats of the hybrid run as soon as it is over
Hey Sergey,
So the hybrid run got over yesterday. It is a a much improved assembly over Pacbio-only. The NG50 is ~3.27 Mb which is quite good. After the trimming step around 25X coverage was left, which I am hoping was good enough to cover the entire genome. Entire 2.4 Gb is covered in 2766 total contigs which can be scaffolded to chromosomal level most likely. Lets see what happens.
I am attaching the report file as well the BUSCO stats for the hybrid assembly.
[CORRECTION/READS]
--
-- In gatekeeper store './MammalianHybrid.gkpStore':
-- Found 14672377 reads.
-- Found 95236354580 bases (39.68 times coverage).
--
-- Read length histogram (one '*' equals 41786.5 reads):
-- 0 999 0
-- 1000 1999 2925055 **********************************************************************
-- 2000 2999 2277601 ******************************************************
-- 3000 3999 1380668 *********************************
-- 4000 4999 1194684 ****************************
-- 5000 5999 989800 ***********************
-- 6000 6999 838688 ********************
-- 7000 7999 722707 *****************
-- 8000 8999 624790 **************
-- 9000 9999 542690 ************
-- 10000 10999 472620 ***********
-- 11000 11999 411580 *********
-- 12000 12999 362712 ********
-- 13000 13999 328994 *******
-- 14000 14999 301592 *******
-- 15000 15999 256455 ******
-- 16000 16999 207704 ****
-- 17000 17999 165323 ***
-- 18000 18999 131907 ***
-- 19000 19999 105471 **
-- 20000 20999 84897 **
-- 21000 21999 67813 *
-- 22000 22999 53805 *
-- 23000 23999 43191 *
-- 24000 24999 34691
-- 25000 25999 27907
-- 26000 26999 22708
-- 27000 27999 18319
-- 28000 28999 14704
-- 29000 29999 11685
-- 30000 30999 9740
-- 31000 31999 7690
-- 32000 32999 6282
-- 33000 33999 5125
-- 34000 34999 4168
-- 35000 35999 3251
-- 36000 36999 2823
-- 37000 37999 2224
-- 38000 38999 1723
-- 39000 39999 1526
-- 40000 40999 1133
-- 41000 41999 915
-- 42000 42999 842
-- 43000 43999 646
-- 44000 44999 603
-- 45000 45999 472
-- 46000 46999 372
-- 47000 47999 308
-- 48000 48999 266
-- 49000 49999 229
-- 50000 50999 208
-- 51000 51999 168
-- 52000 52999 122
-- 53000 53999 114
-- 54000 54999 107
-- 55000 55999 80
-- 56000 56999 60
-- 57000 57999 72
-- 58000 58999 49
-- 59000 59999 42
-- 60000 60999 35
-- 61000 61999 32
-- 62000 62999 16
-- 63000 63999 22
-- 64000 64999 23
-- 65000 65999 17
-- 66000 66999 20
-- 67000 67999 11
-- 68000 68999 7
-- 69000 69999 11
-- 70000 70999 11
-- 71000 71999 6
-- 72000 72999 10
-- 73000 73999 6
-- 74000 74999 3
-- 75000 75999 3
-- 76000 76999 5
-- 77000 77999 1
-- 78000 78999 2
-- 79000 79999 2
-- 80000 80999 1
-- 81000 81999 0
-- 82000 82999 3
-- 83000 83999 1
-- 84000 84999 0
-- 85000 85999 0
-- 86000 86999 1
-- 87000 87999 1
-- 88000 88999 1
-- 89000 89999 1
-- 90000 90999 0
-- 91000 91999 0
-- 92000 92999 0
-- 93000 93999 0
-- 94000 94999 1
-- 95000 95999 0
-- 96000 96999 0
-- 97000 97999 1
-- 98000 98999 1
-- 99000 99999 0
-- 100000 100999 1
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 14414041 *** 0.0067 0.0002
-- 2- 2 28076792 ******* 0.0198 0.0007
-- 3- 4 93107255 ************************ 0.0391 0.0020
-- 5- 7 185834678 ************************************************ 0.0907 0.0073
-- 8- 11 253441217 ***************************************************************** 0.1802 0.0215
-- 12- 16 269315286 ********************************************************************** 0.2958 0.0488
-- 17- 22 235321302 ************************************************************* 0.4151 0.0888
-- 23- 29 190222088 ************************************************* 0.5184 0.1362
-- 30- 37 159558664 ***************************************** 0.6029 0.1873
-- 38- 46 139461582 ************************************ 0.6749 0.2431
-- 47- 56 120887820 ******************************* 0.7382 0.3042
-- 57- 67 100885180 ************************** 0.7932 0.3691
-- 68- 79 80792550 ******************** 0.8390 0.4339
-- 80- 92 62673074 **************** 0.8757 0.4953
-- 93- 106 47651817 ************ 0.9042 0.5510
-- 107- 121 35871557 ********* 0.9259 0.5999
-- 122- 137 26947996 ******* 0.9422 0.6421
-- 138- 154 20313617 ***** 0.9545 0.6782
-- 155- 172 15438313 **** 0.9638 0.7089
-- 173- 191 11830713 *** 0.9709 0.7351
-- 192- 211 9164680 ** 0.9763 0.7575
-- 212- 232 7184839 * 0.9805 0.7767
-- 233- 254 5693151 * 0.9838 0.7933
-- 255- 277 4559159 * 0.9865 0.8077
-- 278- 301 3679806 0.9886 0.8204
-- 302- 326 2995938 0.9903 0.8315
-- 327- 352 2448942 0.9917 0.8413
-- 353- 379 2019631 0.9928 0.8500
-- 380- 407 1675761 0.9937 0.8577
-- 408- 436 1402500 0.9945 0.8646
-- 437- 466 1183979 0.9951 0.8708
-- 467- 497 1008926 0.9957 0.8764
-- 498- 529 865336 0.9962 0.8815
-- 530- 562 747902 0.9966 0.8861
-- 563- 596 649837 0.9969 0.8904
-- 597- 631 569865 0.9972 0.8943
-- 632- 667 502245 0.9975 0.8980
-- 668- 704 443762 0.9977 0.9014
-- 705- 742 392517 0.9979 0.9046
-- 743- 781 347755 0.9981 0.9076
-- 782- 821 309730 0.9983 0.9104
--
-- 45894597 (max occurrences)
-- 95001854884 (total mers, non-unique)
-- 2128910838 (distinct mers, non-unique)
-- 14414041 (unique mers)
[CORRECTION/LAYOUT]
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 14581003 91374
-- Number of Bases 94868798469 367392325
-- Coverage 39.529 0.153
-- Median 4624 1709
-- Mean 6506 4020
-- N50 10172 8614
-- Minimum 1000 0
-- Maximum 100428 82287
--
-- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 14656147 14581003 14581003 0 0
-- Number of Bases 95139191113 94868798469 90047929203 0 0
-- Coverage 39.641 39.529 37.520 0.000 0.000
-- Median 4606 4624 4244 0 0
-- Mean 6491 6506 6175 0 0
-- N50 10166 10172 10190 0 0
-- Minimum 1000 1000 1 0 0
-- Maximum 100428 100428 100427 0 0
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 91374 91374
-- Number of Bases 367392325 0
-- Coverage 0.153 0.000
-- Median 1709 0
-- Mean 4020 0
-- N50 8614 0
-- Minimum 0 0
-- Maximum 82287 0
--
-- Maximum Memory 1787445848
[TRIMMING/READS]
--
-- In gatekeeper store './MammalianHybrid.gkpStore':
-- Found 14580025 reads.
-- Found 92999998465 bases (38.74 times coverage).
--
-- Read length histogram (one '*' equals 45209.94 reads):
-- 0 999 11328
-- 1000 1999 3164696 **********************************************************************
-- 2000 2999 2008466 ********************************************
-- 3000 3999 1406712 *******************************
-- 4000 4999 1206674 **************************
-- 5000 5999 999568 **********************
-- 6000 6999 845168 ******************
-- 7000 7999 724516 ****************
-- 8000 8999 623840 *************
-- 9000 9999 540053 ***********
-- 10000 10999 466946 **********
-- 11000 11999 406263 ********
-- 12000 12999 356651 *******
-- 13000 13999 325901 *******
-- 14000 14999 294421 ******
-- 15000 15999 243614 *****
-- 16000 16999 193897 ****
-- 17000 17999 153071 ***
-- 18000 18999 121149 **
-- 19000 19999 97220 **
-- 20000 20999 77867 *
-- 21000 21999 61873 *
-- 22000 22999 48871 *
-- 23000 23999 39168
-- 24000 24999 31363
-- 25000 25999 25027
-- 26000 26999 20534
-- 27000 27999 16405
-- 28000 28999 12896
-- 29000 29999 10484
-- 30000 30999 8578
-- 31000 31999 6866
-- 32000 32999 5490
-- 33000 33999 4492
-- 34000 34999 3585
-- 35000 35999 2974
-- 36000 36999 2491
-- 37000 37999 1855
-- 38000 38999 1546
-- 39000 39999 1300
-- 40000 40999 992
-- 41000 41999 831
-- 42000 42999 701
-- 43000 43999 630
-- 44000 44999 499
-- 45000 45999 393
-- 46000 46999 321
-- 47000 47999 278
-- 48000 48999 249
-- 49000 49999 213
-- 50000 50999 169
-- 51000 51999 133
-- 52000 52999 121
-- 53000 53999 93
-- 54000 54999 87
-- 55000 55999 67
-- 56000 56999 59
-- 57000 57999 60
-- 58000 58999 48
-- 59000 59999 42
-- 60000 60999 24
-- 61000 61999 28
-- 62000 62999 19
-- 63000 63999 18
-- 64000 64999 17
-- 65000 65999 21
-- 66000 66999 14
-- 67000 67999 11
-- 68000 68999 6
-- 69000 69999 10
-- 70000 70999 6
-- 71000 71999 7
-- 72000 72999 8
-- 73000 73999 5
-- 74000 74999 3
-- 75000 75999 3
-- 76000 76999 3
-- 77000 77999 1
-- 78000 78999 3
-- 79000 79999 1
-- 80000 80999 1
-- 81000 81999 2
-- 82000 82999 1
-- 83000 83999 0
-- 84000 84999 0
-- 85000 85999 0
-- 86000 86999 3
-- 87000 87999 0
-- 88000 88999 0
-- 89000 89999 1
-- 90000 90999 0
-- 91000 91999 0
-- 92000 92999 0
-- 93000 93999 0
-- 94000 94999 1
-- 95000 95999 1
-- 96000 96999 1
-- 97000 97999 0
-- 98000 98999 0
-- 99000 99999 0
-- 100000 100999 1
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 37295971775 *******************************************************************--> 0.8800 0.4024
-- 2- 2 1851812923 ********************************************************************** 0.9237 0.4423
-- 3- 4 811296117 ****************************** 0.9366 0.4600
-- 5- 7 381101587 ************** 0.9466 0.4802
-- 8- 11 356914325 ************* 0.9539 0.5032
-- 12- 16 486788188 ****************** 0.9624 0.5440
-- 17- 22 568327909 ********************* 0.9741 0.6242
-- 23- 29 414740968 *************** 0.9870 0.7444
-- 30- 37 151525399 ***** 0.9957 0.8498
-- 38- 46 30788820 * 0.9986 0.8948
-- 47- 56 10025838 0.9992 0.9064
-- 57- 67 5546408 0.9994 0.9116
-- 68- 79 3602331 0.9996 0.9151
-- 80- 92 2572241 0.9997 0.9179
-- 93- 106 1949976 0.9997 0.9202
-- 107- 121 1528054 0.9998 0.9223
-- 122- 137 1213195 0.9998 0.9241
-- 138- 154 976805 0.9998 0.9258
-- 155- 172 805010 0.9998 0.9273
-- 173- 191 657225 0.9999 0.9287
-- 192- 211 553177 0.9999 0.9300
-- 212- 232 474531 0.9999 0.9312
-- 233- 254 411152 0.9999 0.9323
-- 255- 277 356597 0.9999 0.9334
-- 278- 301 313552 0.9999 0.9344
-- 302- 326 286003 0.9999 0.9354
-- 327- 352 270811 0.9999 0.9364
-- 353- 379 256903 0.9999 0.9374
-- 380- 407 230971 0.9999 0.9384
-- 408- 436 201744 0.9999 0.9394
-- 437- 466 171887 1.0000 0.9403
-- 467- 497 146859 1.0000 0.9411
-- 498- 529 126316 1.0000 0.9419
-- 530- 562 108979 1.0000 0.9426
-- 563- 596 95295 1.0000 0.9432
-- 597- 631 84597 1.0000 0.9438
-- 632- 667 76264 1.0000 0.9443
-- 668- 704 68831 1.0000 0.9449
-- 705- 742 63890 1.0000 0.9454
-- 743- 781 58910 1.0000 0.9459
-- 782- 821 55397 1.0000 0.9464
--
-- 15730053 (max occurrences)
-- 55397846165 (total mers, non-unique)
-- 5087535457 (distinct mers, non-unique)
-- 37295971775 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1200 (use overlaps at or below this fraction error)
-- 1 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 14672377 reads 92999998465 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 8658829 reads 59789743255 bases (trimmed reads output)
-- 459530 reads 1724060622 bases (reads with no change, kept as is)
-- 4935192 reads 19161413842 bases (reads with no overlaps, deleted)
-- 618826 reads 1839318641 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 7735778 reads 5583870621 bases (bases trimmed from the 5' end of a read)
-- 8135760 reads 4901591484 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.1200 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 9118359 reads 71999265982 bases (reads processed)
-- 5554018 reads 21000732483 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 54 reads 408293 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 9118305 reads 71998857689 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 13711 reads 13816 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 13816 reads 3934620 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 6709 reads 26127706 bases (trimmed from the 5' end of the read)
-- 7004 reads 25449523 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In gatekeeper store './MammalianHybrid.gkpStore':
-- Found 9118253 reads.
-- Found 61462138505 bases (25.6 times coverage).
--
-- Read length histogram (one '*' equals 26078.58 reads):
-- 0 999 0
-- 1000 1999 1825501 **********************************************************************
-- 2000 2999 1031110 ***************************************
-- 3000 3999 845287 ********************************
-- 4000 4999 750765 ****************************
-- 5000 5999 652836 *************************
-- 6000 6999 571086 *********************
-- 7000 7999 501520 *******************
-- 8000 8999 438592 ****************
-- 9000 9999 383856 **************
-- 10000 10999 334543 ************
-- 11000 11999 292978 ***********
-- 12000 12999 260678 *********
-- 13000 13999 240018 *********
-- 14000 14999 212488 ********
-- 15000 15999 171268 ******
-- 16000 16999 132424 *****
-- 17000 17999 102116 ***
-- 18000 18999 79973 ***
-- 19000 19999 62603 **
-- 20000 20999 49254 *
-- 21000 21999 38672 *
-- 22000 22999 29962 *
-- 23000 23999 23484
-- 24000 24999 18509
-- 25000 25999 14537
-- 26000 26999 11489
-- 27000 27999 8979
-- 28000 28999 6910
-- 29000 29999 5652
-- 30000 30999 4368
-- 31000 31999 3488
-- 32000 32999 2767
-- 33000 33999 2176
-- 34000 34999 1689
-- 35000 35999 1362
-- 36000 36999 1024
-- 37000 37999 839
-- 38000 38999 662
-- 39000 39999 515
-- 40000 40999 407
-- 41000 41999 341
-- 42000 42999 300
-- 43000 43999 239
-- 44000 44999 200
-- 45000 45999 133
-- 46000 46999 117
-- 47000 47999 90
-- 48000 48999 77
-- 49000 49999 84
-- 50000 50999 70
-- 51000 51999 39
-- 52000 52999 39
-- 53000 53999 26
-- 54000 54999 26
-- 55000 55999 12
-- 56000 56999 11
-- 57000 57999 10
-- 58000 58999 13
-- 59000 59999 12
-- 60000 60999 2
-- 61000 61999 5
-- 62000 62999 3
-- 63000 63999 1
-- 64000 64999 3
-- 65000 65999 5
-- 66000 66999 1
-- 67000 67999 1
-- 68000 68999 3
-- 69000 69999 1
-- 70000 70999 1
-- 71000 71999 1
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 11172620458 *******************************************************************--> 0.7329 0.1823
-- 2- 2 1161431148 ********************************************************************** 0.8091 0.2203
-- 3- 4 611482732 ************************************ 0.8356 0.2400
-- 5- 7 331129379 ******************* 0.8581 0.2646
-- 8- 11 348290804 ******************** 0.8766 0.2965
-- 12- 16 485649965 ***************************** 0.8998 0.3575
-- 17- 22 555861335 ********************************* 0.9322 0.4784
-- 23- 29 389551197 *********************** 0.9671 0.6551
-- 30- 37 134377400 ******** 0.9896 0.8037
-- 38- 46 25832043 * 0.9968 0.8635
-- 47- 56 8376982 0.9982 0.8781
-- 57- 67 4537876 0.9987 0.8847
-- 68- 79 2909032 0.9990 0.8890
-- 80- 92 2075794 0.9992 0.8924
-- 93- 106 1579598 0.9993 0.8953
-- 107- 121 1242306 0.9994 0.8978
-- 122- 137 982702 0.9995 0.9001
-- 138- 154 793829 0.9996 0.9021
-- 155- 172 651477 0.9996 0.9040
-- 173- 191 533060 0.9997 0.9057
-- 192- 211 450823 0.9997 0.9073
-- 212- 232 392695 0.9997 0.9088
-- 233- 254 338687 0.9998 0.9102
-- 255- 277 294275 0.9998 0.9115
-- 278- 301 267350 0.9998 0.9128
-- 302- 326 249373 0.9998 0.9140
-- 327- 352 241247 0.9998 0.9153
-- 353- 379 221961 0.9999 0.9167
-- 380- 407 195295 0.9999 0.9180
-- 408- 436 167103 0.9999 0.9192
-- 437- 466 140931 0.9999 0.9204
-- 467- 497 121076 0.9999 0.9214
-- 498- 529 102669 0.9999 0.9224
-- 530- 562 88612 0.9999 0.9232
-- 563- 596 78609 0.9999 0.9240
-- 597- 631 69406 0.9999 0.9247
-- 632- 667 63064 0.9999 0.9254
-- 668- 704 57632 0.9999 0.9261
-- 705- 742 53189 0.9999 0.9267
-- 743- 781 49814 0.9999 0.9274
-- 782- 821 45924 0.9999 0.9280
--
-- 8346914 (max occurrences)
-- 50098034734 (total mers, non-unique)
-- 4071788061 (distinct mers, non-unique)
-- 11172620458 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 744 0.01 6104.32 +- 4854.60 388.99 +- 476.76 (bad trimming)
-- middle-hump 101 0.00 3675.01 +- 2490.41 234.88 +- 449.37 (bad trimming)
-- no-5-prime 1376 0.02 7220.38 +- 5605.10 261.33 +- 457.43 (bad trimming)
-- no-3-prime 1812 0.02 7850.46 +- 5756.87 225.72 +- 423.40 (bad trimming)
--
-- low-coverage 60763 0.67 2063.02 +- 1321.15 5.73 +- 1.78 (easy to assemble, potential for lower quality consensus)
-- unique 6978821 76.54 6620.95 +- 5050.92 25.01 +- 6.16 (easy to assemble, perfect, yay)
-- repeat-cont 743748 8.16 3023.64 +- 3170.97 2627.92 +- 3359.47 (potential for consensus errors, no impact on assembly)
-- repeat-dove 1243 0.01 22114.86 +- 9498.41 482.93 +- 1094.61 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 769221 8.44 10966.59 +- 6532.41 2437.63 +- 2907.39 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 480812 5.27 6745.51 +- 4395.12 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 62895 0.69 16390.36 +- 6010.52 (will end contigs, potential to misassemble)
-- uniq-anchor 9599 0.11 10580.37 +- 5931.30 3728.62 +- 4269.05 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 2766 sequences, total length 2403212721 bp (including 272 repeats of total length 6709814 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 1884490 sequences, total length 7526628627 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 9994699 20 247895044
-- 20 7498391 48 485356104
-- 30 5694133 84 721398530
-- 40 4262615 133 960671602
-- 50 3272049 199 1200394793
-- 60 2425774 285 1441105224
-- 70 1723458 403 1680264870
-- 80 1159844 573 1921127241
-- 90 619550 853 2160289294
-- 100 13629 2283 2400002451
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 2766 sequences, total length 2400638939 bp (including 272 repeats of total length 6672776 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 1884490 sequences, total length 7526613120 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 9980068 20 247471354
-- 20 7491969 48 484637275
-- 30 5689728 84 720428199
-- 40 4183826 134 963670056
-- 50 3267225 200 1202267975
-- 60 2412360 286 1441926762
-- 70 1721309 404 1680217489
-- 80 1157649 574 1920294653
-- 90 616254 857 2160539298
-- 100 5330 2560 2400004284
--
BUSCO stats.
# BUSCO version is: 3.0.2
# The lineage dataset is: mammalia_odb9 (Creation date: 2016-02-13, number of species: 50, number of BUSCOs: 4104)
# BUSCO was run in mode: genome
C:89.1%[S:88.6%,D:0.5%],F:6.6%,M:4.3%,n:4104
3656 Complete BUSCOs (C)
3637 Complete and single-copy BUSCOs (S)
19 Complete and duplicated BUSCOs (D)
269 Fragmented BUSCOs (F)
179 Missing BUSCOs (M)
4104 Total BUSCO groups searched
Thank you so much for all the help Sergey.
With Regards, Harsh
Yes, the report looks good. There is a k-mer peak at around 20x coverage after correction/trimming which is consistent with your genome size. It looks like most overlaps are also unique which is good for the assembly. The busco report also looks reasonable and might improve if you run Arrow with the PacBio data on the assembly.
Hi, First off all thank you for the lovely tool. I am trying to assemble a mammalian genome (~2.4g haploid genome size ). I believe the individual (and the species) we are assembling is not very heterozygous ( < 0.5%) .
We generated around ~33X of Pacbio Sequel data (N50:10.3Kb) and I did a run with more or less default parameters. Since it was sequel data the only things I changed was (as discussed in FAQ) corMhapSensitivity=normal correctedErrorRate=0.085 (Since my depth was little low I increased it a little)
After running the error correction stage half of my coverage around ~15X was gone; I was only left with 15.6 X of coverage and after trimming only 15.31 X remained. Even though I knew that this would give me a fragmented assembly and may not even cover the entire genome I decided to go ahead with it.
As expected the NG50 was quite low. ~ 215kb but we got the expected assembly size (~2.3g)
Running BUSCO on it confirmed my hunch that the assembly is fragmented and does not cover the entire genome. ~13% BUSCOs are fragmented and ~10 % are missing
So What could I do to make my assembly a little better ? Since my initial coverage was slightly > 30X corMinCoverage=4 (choosen by default) . Should I change it to corMinCoverage=0 and run it again? Is there anything else I can change ?
Also in reference to an earlier post #848 when I look at the unitigging/4-/001*thr000.num000.log the error rate seems to be quite high I cant seem to wrap my head around that part. Should I also increase the correctedErrorRate from 0.085 to 0.105?
001thr000.num000.log file
Also I attaching my genome scope output from untigging step and the final report generated from canu.
Canu Final Report
Canu version : Canu 1.7.1 OS : CentOS 6.5 SGE
P.S : To make matters more complex I have gotten ~7.1 X of nanopore data and now I have to run an hybrid assembly ? Any suggestions on that ?
Thanking You,
Regards, Harsh