Closed ml3958 closed 6 years ago
Your second attempt did not work, it ended up with only 3x, this would imply the longest reads aren't very high quality but that's not necessarily surprising itself. The k-mer distribution from the corrected reads is consistent with a 2-3mb genome size. However, the 22mb assembly when you increase the coverage and the lack of assembly with the 90x coverage is suspicious.
Are you able to share the data, see the FAQ for instructions to send it to us. Otherwise, you could try running mash screen (https://github.com/marbl/Mash) on it as well as running GenomeScope (http://qb.cshl.edu/genomescope/) on the unitigging/0-*/*.histogram
outputs. Have you tried mapping the data to a close reference to estimate identity/coverage? That should provide more information on what's going on with the data.
Thanks for the reply. I will try to send the data.
Mean while, I ran GenomeScope with the first two column of my unitigging/0-*/*.histogram
file (generated with the first attempt that increased corOutCoverage=100
and got 90 X coverage). the result is here http://qb.cshl.edu/genomescope/analysis.php?code=qHGceJKZyhvPegWXXvWt with default GenomeScope settings.
I can not put the data to the ftp drive. I successfully connected to the ftp, but when I try to cd to incoming/sergek
folder it seems that this folder does not exist.
Can you please double check? thanks
Yep, the FTP is working properly:
% ftp ftp.cbcb.umd.edu
Trying 128.8.132.69...
Connected to ftp.cbcb.umd.edu (128.8.132.69).
220-
220-Welcome to the CBCB FTP Server
220-
220-Please visit, http://www.cbcb.umd.edu
220-for more information.
220-
220
Name (ftp.cbcb.umd.edu:skoren): anonymous
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd incoming/sergek
250 Directory successfully changed.
ftp> put test
local: test remote: test
227 Entering Passive Mode (128,8,132,69,31,96).
150 Ok to send data.
226 Transfer complete.
ftp> ls
227 Entering Passive Mode (128,8,132,69,31,88).
150 Here comes the directory listing.
226 Transfer done (but failed to open directory).
You can't ls/read the directory but you can run put.
Hi Sergey, thanks! I succsufully put the data on the github page named as SRR6331514.fastq.gz. But I think I realized the problem... When I download the data from ftp, the process got disrupted so only 90% of the data were successfully downloaded.
Now I got a much more continuous assembly!
Now on to the next step to polish it. Than you!
Ah, I didn't realize this is a public dataset, then I don't need the FTP upload. Getting the raw data from the SRA: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR6331514 shows the following contents:
m160920_005603_42154_c101069572550000001823244402101752_s1_p0.1.bax.h5
m160920_005603_42154_c101069572550000001823244402101752_s1_p0.2.bax.h5
m160920_005603_42154_c101069572550000001823244402101752_s1_p0.3.bax.h5
m160921_141610_42154_c101069832550000001823244402101707_s1_p0.1.bax.h5
m160921_141610_42154_c101069832550000001823244402101707_s1_p0.2.bax.h5
m160921_141610_42154_c101069832550000001823244402101707_s1_p0.3.bax.h5
Each cell is composed of 3 bax.h5 files and an index bas.h5 file. Here, we have two cells mixed together. That would explain the very high coverage and large number of reads for an RS, which this appears to be. I can't imagine a single genome would get sequenced by more than one cell given the coverages per cell here so it is possible this is two different organisms in one SRA sample. That might explain the assembly issues. I launched an assembly using each cell separately and will let you know when I have results.
As general advice, I always prefer to download the raw data from the SRA and convert it to fastq rather than relying on the fastq download because I've had issues in the past with the fastq in SRA not matching what I get by dumping the raw files. In this case, dumping each cell separately (keeping reads >= 500bp and quality 0.75) I ended up with about 1GBp of sequence each and 170k reads, with an average read length of 6kbp and about 350X coverage of a 3mb genome. These stats are in line with an RSII cell. The fastq you uploaded from the SRA has 4.8 Gbp and 330k reads with an average read length of 14.7kb and about 1600X coverage of a 3mb genome. Even combining the two cells separately, I don't come close to those stats. There is no minimum read length but the longest reads are also much longer than my extraction, I'd guess there is no quality filter set either.
As expected, the assembly of the combined cells and each cells individually completed and all produced a single circular contig of 2.5mb. The individual cell assemblies were almost identical so this is the same genome sequenced to extremely high coverage. Here is the bandage plot for the combined assembly:
The short sequence is the pacbio control sequence. Happy to share either asm or an asm of both cells combined. Here is the asm report for the combined assembly
[CORRECTION/READS]
--
-- In gatekeeper store './asm.gkpStore':
-- Found 318231 reads.
-- Found 2115476345 bases (705.15 times coverage).
--
-- Read length histogram (one '*' equals 528.14 reads):
-- 0 999 0
-- 1000 1999 28753 ******************************************************
-- 2000 2999 32394 *************************************************************
-- 3000 3999 28830 ******************************************************
-- 4000 4999 29109 *******************************************************
-- 5000 5999 27310 ***************************************************
-- 6000 6999 24941 ***********************************************
-- 7000 7999 23603 ********************************************
-- 8000 8999 35121 ******************************************************************
-- 9000 9999 36970 **********************************************************************
-- 10000 10999 21456 ****************************************
-- 11000 11999 14074 **************************
-- 12000 12999 5919 ***********
-- 13000 13999 2553 ****
-- 14000 14999 1668 ***
-- 15000 15999 1224 **
-- 16000 16999 1016 *
-- 17000 17999 843 *
-- 18000 18999 592 *
-- 19000 19999 442
-- 20000 20999 361
-- 21000 21999 277
-- 22000 22999 217
-- 23000 23999 133
-- 24000 24999 112
-- 25000 25999 74
-- 26000 26999 49
-- 27000 27999 41
-- 28000 28999 44
-- 29000 29999 30
-- 30000 30999 20
-- 31000 31999 12
-- 32000 32999 10
-- 33000 33999 9
-- 34000 34999 7
-- 35000 35999 5
-- 36000 36999 3
-- 37000 37999 1
-- 38000 38999 5
-- 39000 39999 2
-- 40000 40999 1
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 479286332 *******************************************************************--> 0.5940 0.2271
-- 2- 2 163039707 ********************************************************************** 0.7961 0.3816
-- 3- 4 96157434 ***************************************** 0.8768 0.4741
-- 5- 7 35128270 *************** 0.9365 0.5736
-- 8- 11 15260987 ****** 0.9654 0.6483
-- 12- 16 7779987 *** 0.9804 0.7070
-- 17- 22 3950821 * 0.9885 0.7524
-- 23- 29 2020160 0.9927 0.7851
-- 30- 37 987264 0.9949 0.8077
-- 38- 46 460018 0.9960 0.8219
-- 47- 56 201389 0.9966 0.8303
-- 57- 67 90278 0.9968 0.8348
-- 68- 79 62275 0.9969 0.8372
-- 80- 92 108264 0.9970 0.8395
-- 93- 106 245246 0.9971 0.8442
-- 107- 121 445389 0.9974 0.8566
-- 122- 137 581498 0.9980 0.8815
-- 138- 154 526363 0.9987 0.9175
-- 155- 172 320747 0.9994 0.9533
-- 173- 191 131465 0.9997 0.9771
-- 192- 211 43003 0.9999 0.9877
-- 212- 232 16234 0.9999 0.9915
-- 233- 254 9989 1.0000 0.9932
-- 255- 277 7522 1.0000 0.9943
-- 278- 301 5586 1.0000 0.9953
-- 302- 326 3642 1.0000 0.9960
-- 327- 352 1969 1.0000 0.9965
-- 353- 379 970 1.0000 0.9969
-- 380- 407 607 1.0000 0.9970
-- 408- 436 463 1.0000 0.9971
-- 437- 466 337 1.0000 0.9972
-- 467- 497 235 1.0000 0.9973
-- 498- 529 220 1.0000 0.9973
-- 530- 562 131 1.0000 0.9974
-- 563- 596 125 1.0000 0.9974
-- 597- 631 141 1.0000 0.9975
-- 632- 667 131 1.0000 0.9975
-- 668- 704 163 1.0000 0.9975
-- 705- 742 159 1.0000 0.9976
-- 743- 781 142 1.0000 0.9977
-- 782- 821 213 1.0000 0.9977
--
-- 825624 (max occurrences)
-- 1631416548 (total mers, non-unique)
-- 327591833 (distinct mers, non-unique)
-- 479286332 (unique mers)
[CORRECTION/LAYOUT]
-- original original
-- raw reads raw reads
-- category w/overlaps w/o/overlaps
-- -------------------- ------------- -------------
-- Number of Reads 249087 69144
-- Number of Bases 1710355675 334805902
-- Coverage 570.119 111.602
-- Median 6952 4081
-- Mean 6866 4842
-- N50 8873 7820
-- Minimum 1000 0
-- Maximum 40147 32203
--
-- --------corrected--------- ----------rescued----------
-- evidence expected expected
-- category reads raw corrected raw corrected
-- -------------------- ------------- ------------- ------------- ------------- -------------
-- Number of Reads 183746 9486 9486 38172 38172
-- Number of Bases 1448344402 123115677 120004745 176499721 126411875
-- Coverage 482.781 41.039 40.002 58.833 42.137
-- Median 8462 11926 11823 4358 2927
-- Mean 7882 12978 12650 4623 3311
-- N50 9180 12026 11891 5497 4445
-- Minimum 1000 11276 11275 1001 501
-- Maximum 40147 40147 40134 27227 11259
--
-- --------uncorrected--------
-- expected
-- category raw corrected
-- -------------------- ------------- -------------
-- Number of Reads 270573 270573
-- Number of Bases 1745546179 927247764
-- Coverage 581.849 309.083
-- Median 6635 221
-- Mean 6451 3426
-- N50 8790 8918
-- Minimum 0 0
-- Maximum 36176 11274
--
-- Maximum Memory 942604106
[TRIMMING/READS]
--
-- In gatekeeper store './asm.gkpStore':
-- Found 47576 reads.
-- Found 224759402 bases (74.91 times coverage).
--
-- Read length histogram (one '*' equals 142.95 reads):
-- 0 999 4330 ******************************
-- 1000 1999 10007 **********************************************************************
-- 2000 2999 6393 ********************************************
-- 3000 3999 5729 ****************************************
-- 4000 4999 4948 **********************************
-- 5000 5999 3791 **************************
-- 6000 6999 2239 ***************
-- 7000 7999 833 *****
-- 8000 8999 385 **
-- 9000 9999 385 **
-- 10000 10999 2714 ******************
-- 11000 11999 4857 *********************************
-- 12000 12999 264 *
-- 13000 13999 83
-- 14000 14999 79
-- 15000 15999 96
-- 16000 16999 126
-- 17000 17999 107
-- 18000 18999 71
-- 19000 19999 56
-- 20000 20999 41
-- 21000 21999 25
-- 22000 22999 11
-- 23000 23999 5
-- 24000 24999 1
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 17815705 *******************************************************************--> 0.7426 0.0796
-- 2- 2 2102826 ********************************************************************** 0.8303 0.0984
-- 3- 4 1101765 ************************************ 0.8615 0.1084
-- 5- 7 366393 ************ 0.8841 0.1190
-- 8- 11 106475 *** 0.8933 0.1256
-- 12- 16 30450 * 0.8964 0.1289
-- 17- 22 8417 0.8973 0.1304
-- 23- 29 4520 0.8976 0.1310
-- 30- 37 17760 0.8978 0.1316
-- 38- 46 74433 ** 0.8987 0.1350
-- 47- 56 222943 ******* 0.9022 0.1513
-- 57- 67 501607 **************** 0.9123 0.2092
-- 68- 79 627801 ******************** 0.9340 0.3565
-- 80- 92 530937 ***************** 0.9600 0.5637
-- 93- 106 286412 ********* 0.9814 0.7618
-- 107- 121 112303 *** 0.9925 0.8809
-- 122- 137 45638 * 0.9969 0.9344
-- 138- 154 17735 0.9987 0.9592
-- 155- 172 6172 0.9994 0.9698
-- 173- 191 3312 0.9996 0.9742
-- 192- 211 1396 0.9998 0.9767
-- 212- 232 1884 0.9998 0.9780
-- 233- 254 175 0.9999 0.9798
-- 255- 277 74 0.9999 0.9800
-- 278- 301 55 0.9999 0.9801
-- 302- 326 86 0.9999 0.9801
-- 327- 352 57 0.9999 0.9803
-- 353- 379 46 0.9999 0.9803
-- 380- 407 51 0.9999 0.9804
-- 408- 436 27 0.9999 0.9805
-- 437- 466 25 0.9999 0.9806
-- 467- 497 6 0.9999 0.9806
-- 498- 529 23 0.9999 0.9806
-- 530- 562 26 0.9999 0.9807
-- 563- 596 27 0.9999 0.9807
-- 597- 631 16 0.9999 0.9808
-- 632- 667 7 0.9999 0.9808
-- 668- 704 3 0.9999 0.9809
-- 705- 742 12 0.9999 0.9809
-- 743- 781 2 0.9999 0.9809
-- 782- 821 3 0.9999 0.9809
--
-- 117322 (max occurrences)
-- 205944908 (total mers, non-unique)
-- 6173892 (distinct mers, non-unique)
-- 17815705 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.0450 (use overlaps at or below this fraction error)
-- 1 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 318231 reads 224759402 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 31881 reads 165018864 bases (trimmed reads output)
-- 10683 reads 47638374 bases (reads with no change, kept as is)
-- 271926 reads 973148 bases (reads with no overlaps, deleted)
-- 3741 reads 3161434 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 21961 reads 4327605 bases (bases trimmed from the 5' end of a read)
-- 21528 reads 3639977 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.0450 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 42564 reads 220624820 bases (reads processed)
-- 275667 reads 4134582 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 0 reads 0 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 42564 reads 220624820 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 195 reads 195 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 195 reads 85910 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 83 reads 656196 bases (trimmed from the 5' end of the read)
-- 112 reads 704033 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In gatekeeper store './asm.gkpStore':
-- Found 42556 reads.
-- Found 211290684 bases (70.43 times coverage).
--
-- Read length histogram (one '*' equals 139.67 reads):
-- 0 999 0
-- 1000 1999 9777 **********************************************************************
-- 2000 2999 6371 *********************************************
-- 3000 3999 5707 ****************************************
-- 4000 4999 4922 ***********************************
-- 5000 5999 3746 **************************
-- 6000 6999 2213 ***************
-- 7000 7999 880 ******
-- 8000 8999 732 *****
-- 9000 9999 603 ****
-- 10000 10999 2799 ********************
-- 11000 11999 4642 *********************************
-- 12000 12999 163 *
-- 13000 13999 1
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 14160028 *******************************************************************--> 0.7124 0.0673
-- 2- 2 1848956 ********************************************************************** 0.8055 0.0849
-- 3- 4 970035 ************************************ 0.8386 0.0943
-- 5- 7 317568 ************ 0.8625 0.1041
-- 8- 11 90747 *** 0.8721 0.1101
-- 12- 16 25144 0.8752 0.1131
-- 17- 22 7108 0.8762 0.1144
-- 23- 29 7213 0.8765 0.1150
-- 30- 37 31711 * 0.8769 0.1161
-- 38- 46 107391 **** 0.8788 0.1224
-- 47- 56 288498 ********** 0.8848 0.1473
-- 57- 67 568259 ********************* 0.9005 0.2260
-- 68- 79 600209 ********************** 0.9296 0.3995
-- 80- 92 461777 ***************** 0.9594 0.6086
-- 93- 106 233136 ******** 0.9816 0.7897
-- 107- 121 91361 *** 0.9925 0.8924
-- 122- 137 38726 * 0.9968 0.9389
-- 138- 154 13993 0.9987 0.9615
-- 155- 172 5705 0.9993 0.9704
-- 173- 191 2660 0.9996 0.9747
-- 192- 211 1530 0.9997 0.9768
-- 212- 232 1437 0.9998 0.9784
-- 233- 254 84 0.9999 0.9798
-- 255- 277 63 0.9999 0.9799
-- 278- 301 63 0.9999 0.9800
-- 302- 326 89 0.9999 0.9800
-- 327- 352 32 0.9999 0.9802
-- 353- 379 66 0.9999 0.9802
-- 380- 407 49 0.9999 0.9803
-- 408- 436 31 0.9999 0.9804
-- 437- 466 9 0.9999 0.9805
-- 467- 497 11 0.9999 0.9805
-- 498- 529 34 0.9999 0.9805
-- 530- 562 11 0.9999 0.9806
-- 563- 596 4 0.9999 0.9806
-- 597- 631 0 0.0000 0.0000
-- 632- 667 2 0.9999 0.9807
-- 668- 704 6 0.9999 0.9807
-- 705- 742 5 0.9999 0.9807
-- 743- 781 4 0.9999 0.9807
-- 782- 821 4 0.9999 0.9807
--
-- 50203 (max occurrences)
-- 196236980 (total mers, non-unique)
-- 5715698 (distinct mers, non-unique)
-- 14160028 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 26 0.06 8271.54 +- 2579.25 422.54 +- 923.55 (bad trimming)
-- middle-hump 2 0.00 1975.50 +- 3.54 0.00 +- 0.00 (bad trimming)
-- no-5-prime 6 0.01 2845.00 +- 2344.22 947.50 +- 2033.27 (bad trimming)
-- no-3-prime 5 0.01 1432.80 +- 439.93 278.20 +- 230.95 (bad trimming)
--
-- low-coverage 37 0.09 2139.19 +- 1134.27 10.85 +- 6.62 (easy to assemble, potential for lower quality consensus)
-- unique 30960 72.75 4767.19 +- 3321.57 76.60 +- 16.21 (easy to assemble, perfect, yay)
-- repeat-cont 2800 6.58 2042.03 +- 664.30 1020.81 +- 586.01 (potential for consensus errors, no impact on assembly)
-- repeat-dove 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 6009 14.12 6910.93 +- 3358.38 2084.38 +- 1844.16 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 2314 5.44 5211.05 +- 2700.87 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 341 0.80 11057.65 +- 1266.38 (will end contigs, potential to misassemble)
-- uniq-anchor 32 0.08 7993.94 +- 3724.38 3956.31 +- 3319.07 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 2 sequences, total length 2496655 bp (including 0 repeats of total length 0 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 10957 sequences, total length 28698963 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2494605 1 2494605
-- 20 2494605 1 2494605
-- 30 2494605 1 2494605
-- 40 2494605 1 2494605
-- 50 2494605 1 2494605
-- 60 2494605 1 2494605
-- 70 2494605 1 2494605
-- 80 2494605 1 2494605
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 2 sequences, total length 2481524 bp (including 0 repeats of total length 0 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 10957 sequences, total length 28698963 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2479478 1 2479478
-- 20 2479478 1 2479478
-- 30 2479478 1 2479478
-- 40 2479478 1 2479478
-- 50 2479478 1 2479478
-- 60 2479478 1 2479478
-- 70 2479478 1 2479478
-- 80 2479478 1 2479478
--
Thank you so much!! This is very helpful. You're definitely right - I used fastq-dump
to download the .fastq files directly and did not do QC. I was looking for a good tool like fastQC for Illumina data for PacBio dataset, but had no luck.
Can you please share the results from data of combined cells? Many thanks.
You don't need QC or anything like a fastqc tool, just the SMRT portal or SMRT link commands to extract fastq files, see issue #34 for example. The issue is that the fastq-dump is reporting everything in the run, including noise where no real data was sequenced and reads going through the adapter. These can probably still be assembled with tuned parameters but the PacBio SMRT link software will automatically filter out the noise. I was able to get close to the expected read set with the fastq-dump command:
fastq-dump --qual-filter-1 -W --readids --read-filter pass --dumpbase -M 500 --readids --gzip --split-spot --skip-technical SRR6331514
The official pacbio advice also seems to be to download the raw files instead of fastq as well: https://github.com/pb-jlandolin/PacbioToSRA/issues/2. You'll need the h5 files and the SMRT link software anyway if you want to run Arrow consensus polishing.
I am having a similar issue with an extremely high coverage bacterial genome dataset that combines two RSII SMRT cells worth of data. I get 3 contigs at the end of the assembly. The smallest must be the PacBio reference as mentioned previously, but I find it odd I cannot close the genome. The bacteria is relatively AT rich and obviously extremely high coverage, so I adjusted the following parameters accordingly: --correctedErrorRate=0.035 --corMaxEvidenceErate=0.15
The report is as follows:
[CORRECTION/READS]
--
-- In gatekeeper store 'correction/Fc_MSFC4.gkpStore':
-- Found 237020 reads.
-- Found 2373922116 bases (719.37 times coverage).
--
-- Read length histogram (one '*' equals 254.35 reads):
-- 0 999 0
-- 1000 1999 13318 ****************************************************
-- 2000 2999 13164 ***************************************************
-- 3000 3999 13213 ***************************************************
-- 4000 4999 13640 *****************************************************
-- 5000 5999 13961 ******************************************************
-- 6000 6999 13981 ******************************************************
-- 7000 7999 13809 ******************************************************
-- 8000 8999 13626 *****************************************************
-- 9000 9999 14376 ********************************************************
-- 10000 10999 15979 **************************************************************
-- 11000 11999 17805 **********************************************************************
-- 12000 12999 16696 *****************************************************************
-- 13000 13999 13374 ****************************************************
-- 14000 14999 10522 *****************************************
-- 15000 15999 8063 *******************************
-- 16000 16999 6345 ************************
-- 17000 17999 5006 *******************
-- 18000 18999 3983 ***************
-- 19000 19999 3150 ************
-- 20000 20999 2582 **********
-- 21000 21999 1996 *******
-- 22000 22999 1559 ******
-- 23000 23999 1284 *****
-- 24000 24999 1041 ****
-- 25000 25999 803 ***
-- 26000 26999 718 **
-- 27000 27999 541 **
-- 28000 28999 426 *
-- 29000 29999 355 *
-- 30000 30999 246
-- 31000 31999 266 *
-- 32000 32999 208
-- 33000 33999 167
-- 34000 34999 139
-- 35000 35999 105
-- 36000 36999 85
-- 37000 37999 77
-- 38000 38999 69
-- 39000 39999 61
-- 40000 40999 49
-- 41000 41999 35
-- 42000 42999 30
-- 43000 43999 25
-- 44000 44999 22
-- 45000 45999 22
-- 46000 46999 16
-- 47000 47999 12
-- 48000 48999 12
-- 49000 49999 9
-- 50000 50999 9
-- 51000 51999 10
-- 52000 52999 5
-- 53000 53999 3
-- 54000 54999 4
-- 55000 55999 5
-- 56000 56999 2
-- 57000 57999 3
-- 58000 58999 1
-- 59000 59999 0
-- 60000 60999 2
-- 61000 61999 2
-- 62000 62999 1
-- 63000 63999 1
-- 64000 64999 0
-- 65000 65999 0
-- 66000 66999 0
-- 67000 67999 0
-- 68000 68999 0
-- 69000 69999 0
-- 70000 70999 0
-- 71000 71999 0
-- 72000 72999 0
-- 73000 73999 1
[CORRECTION/MERS]
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 361226113 *******************************************************************--> 0.5316 0.1524
-- 2- 2 126309308 ********************************************************************** 0.7174 0.2590
-- 3- 4 92795577 *************************************************** 0.8048 0.3341
-- 5- 7 46205992 ************************* 0.8851 0.4351
-- 8- 11 23168514 ************ 0.9337 0.5299
-- 12- 16 12069732 ****** 0.9610 0.6099
-- 17- 22 6425902 *** 0.9760 0.6732
-- 23- 29 3446795 * 0.9843 0.7210
-- 30- 37 1894115 * 0.9888 0.7557
-- 38- 46 1117348 0.9914 0.7806
-- 47- 56 785180 0.9929 0.7995
-- 57- 67 709910 0.9941 0.8163
-- 68- 79 708910 0.9951 0.8349
-- 80- 92 709798 0.9961 0.8571
-- 93- 106 654418 0.9972 0.8829
-- 107- 121 486100 0.9981 0.9101
-- 122- 137 290064 0.9988 0.9328
-- 138- 154 163415 0.9992 0.9480
-- 155- 172 100086 0.9994 0.9577
-- 173- 191 67381 0.9996 0.9645
-- 192- 211 47294 0.9997 0.9695
-- 212- 232 34281 0.9998 0.9735
-- 233- 254 24914 0.9998 0.9766
-- 255- 277 18763 0.9998 0.9792
-- 278- 301 14228 0.9999 0.9812
-- 302- 326 11324 0.9999 0.9830
-- 327- 352 9174 0.9999 0.9845
-- 353- 379 7542 0.9999 0.9858
-- 380- 407 6125 0.9999 0.9869
-- 408- 436 5279 0.9999 0.9879
-- 437- 466 4399 0.9999 0.9889
-- 467- 497 3826 1.0000 0.9897
-- 498- 529 3479 1.0000 0.9905
-- 530- 562 3147 1.0000 0.9912
-- 563- 596 3039 1.0000 0.9919
-- 597- 631 2697 1.0000 0.9927
-- 632- 667 2329 1.0000 0.9934
-- 668- 704 1835 1.0000 0.9940
-- 705- 742 1501 1.0000 0.9945
-- 743- 781 1260 1.0000 0.9950
-- 782- 821 1051 1.0000 0.9954
--
-- 803871 (max occurrences)
-- 2009140703 (total mers, non-unique)
-- 318322643 (distinct mers, non-unique)
-- 361226113 (unique mers)
[CORRECTION/CORRECTIONS]
--
-- Reads to be corrected:
-- 7002 reads longer than 24051 bp
-- 138140871 bp
-- Expected corrected reads:
-- 7002 reads
-- 132000308 bp
-- 15581 bp minimum length
-- 18852 bp mean length
-- 33768 bp n50 length
[TRIMMING/READS]
--
-- In gatekeeper store 'trimming/Fc_MSFC4.gkpStore':
-- Found 7084 reads.
-- Found 126409899 bases (38.3 times coverage).
--
-- Read length histogram (one '*' equals 24.01 reads):
-- 0 999 0
-- 1000 1999 27 *
-- 2000 2999 16
-- 3000 3999 13
-- 4000 4999 17
-- 5000 5999 12
-- 6000 6999 13
-- 7000 7999 11
-- 8000 8999 11
-- 9000 9999 10
-- 10000 10999 11
-- 11000 11999 26 *
-- 12000 12999 29 *
-- 13000 13999 45 *
-- 14000 14999 212 ********
-- 15000 15999 1681 **********************************************************************
-- 16000 16999 1385 *********************************************************
-- 17000 17999 898 *************************************
-- 18000 18999 770 ********************************
-- 19000 19999 495 ********************
-- 20000 20999 376 ***************
-- 21000 21999 265 ***********
-- 22000 22999 192 *******
-- 23000 23999 158 ******
-- 24000 24999 97 ****
-- 25000 25999 78 ***
-- 26000 26999 54 **
-- 27000 27999 39 *
-- 28000 28999 29 *
-- 29000 29999 28 *
-- 30000 30999 22
-- 31000 31999 16
-- 32000 32999 11
-- 33000 33999 10
-- 34000 34999 4
-- 35000 35999 6
-- 36000 36999 4
-- 37000 37999 4
-- 38000 38999 2
-- 39000 39999 1
-- 40000 40999 0
-- 41000 41999 1
-- 42000 42999 1
-- 43000 43999 1
-- 44000 44999 2
-- 45000 45999 1
[TRIMMING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 3970878 *******************************************************************--> 0.5250 0.0314
-- 2- 2 300550 ************************* 0.5648 0.0362
-- 3- 4 109215 ********* 0.5753 0.0381
-- 5- 7 25959 ** 0.5810 0.0396
-- 8- 11 33567 ** 0.5832 0.0405
-- 12- 16 75113 ****** 0.5886 0.0440
-- 17- 22 241383 ******************** 0.6002 0.0546
-- 23- 29 568182 *********************************************** 0.6369 0.1006
-- 30- 37 837010 ********************************************************************** 0.7172 0.2316
-- 38- 46 829562 ********************************************************************* 0.8267 0.4574
-- 47- 56 381827 ******************************* 0.9336 0.7311
-- 57- 67 106774 ******** 0.9773 0.8656
-- 68- 79 35793 ** 0.9897 0.9110
-- 80- 92 7906 0.9940 0.9299
-- 93- 106 3581 0.9949 0.9344
-- 107- 121 7457 0.9953 0.9373
-- 122- 137 3675 0.9964 0.9446
-- 138- 154 2271 0.9968 0.9477
-- 155- 172 3484 0.9971 0.9505
-- 173- 191 2591 0.9976 0.9550
-- 192- 211 2296 0.9979 0.9588
-- 212- 232 1039 0.9982 0.9621
-- 233- 254 1191 0.9983 0.9640
-- 255- 277 1037 0.9985 0.9663
-- 278- 301 1414 0.9986 0.9685
-- 302- 326 4807 0.9988 0.9717
-- 327- 352 1252 0.9995 0.9843
-- 353- 379 1451 0.9996 0.9873
-- 380- 407 393 0.9998 0.9913
-- 408- 436 240 0.9998 0.9925
-- 437- 466 51 0.9999 0.9933
-- 467- 497 90 0.9999 0.9935
-- 498- 529 71 0.9999 0.9938
-- 530- 562 54 0.9999 0.9941
-- 563- 596 185 0.9999 0.9943
-- 597- 631 60 0.9999 0.9952
-- 632- 667 79 0.9999 0.9955
-- 668- 704 103 0.9999 0.9959
-- 705- 742 29 1.0000 0.9964
-- 743- 781 8 1.0000 0.9966
-- 782- 821 4 1.0000 0.9967
--
-- 56304 (max occurrences)
-- 122290257 (total mers, non-unique)
-- 3592020 (distinct mers, non-unique)
-- 3970878 (unique mers)
[TRIMMING/TRIMMING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.0350 (use overlaps at or below this fraction error)
-- 1 (break region if overlap is less than this long, for 'largest covered' algorithm)
-- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm)
--
-- INPUT READS:
-- -----------
-- 7084 reads 126409899 bases (reads processed)
-- 0 reads 0 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- OUTPUT READS:
-- ------------
-- 6550 reads 115265524 bases (trimmed reads output)
-- 512 reads 8975930 bases (reads with no change, kept as is)
-- 18 reads 73419 bases (reads with no overlaps, deleted)
-- 4 reads 5424 bases (reads with short trimmed length, deleted)
--
-- TRIMMING DETAILS:
-- ----------------
-- 4966 reads 1170217 bases (bases trimmed from the 5' end of a read)
-- 5436 reads 919385 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING]
-- PARAMETERS:
-- ----------
-- 1000 (reads trimmed below this many bases are deleted)
-- 0.0350 (use overlaps at or below this fraction error)
-- INPUT READS:
-- -----------
-- 7062 reads 126331056 bases (reads processed)
-- 22 reads 78843 bases (reads not processed, previously deleted)
-- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed)
--
-- PROCESSED:
-- --------
-- 0 reads 0 bases (no overlaps)
-- 0 reads 0 bases (no coverage after adjusting for trimming done already)
-- 0 reads 0 bases (processed for chimera)
-- 0 reads 0 bases (processed for spur)
-- 7062 reads 126331056 bases (processed for subreads)
--
-- READS WITH SIGNALS:
-- ------------------
-- 0 reads 0 signals (number of 5' spur signal)
-- 0 reads 0 signals (number of 3' spur signal)
-- 0 reads 0 signals (number of chimera signal)
-- 6 reads 6 signals (number of subread signal)
--
-- SIGNALS:
-- -------
-- 0 reads 0 bases (size of 5' spur signal)
-- 0 reads 0 bases (size of 3' spur signal)
-- 0 reads 0 bases (size of chimera signal)
-- 6 reads 1864 bases (size of subread signal)
--
-- TRIMMING:
-- --------
-- 3 reads 25137 bases (trimmed from the 5' end of the read)
-- 3 reads 23053 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
--
-- In gatekeeper store 'unitigging/Fc_MSFC4.gkpStore':
-- Found 7062 reads.
-- Found 124193264 bases (37.63 times coverage).
--
-- Read length histogram (one '*' equals 23.95 reads):
-- 0 999 0
-- 1000 1999 21
-- 2000 2999 15
-- 3000 3999 11
-- 4000 4999 15
-- 5000 5999 10
-- 6000 6999 9
-- 7000 7999 13
-- 8000 8999 27 *
-- 9000 9999 43 *
-- 10000 10999 39 *
-- 11000 11999 58 **
-- 12000 12999 64 **
-- 13000 13999 88 ***
-- 14000 14999 274 ***********
-- 15000 15999 1677 **********************************************************************
-- 16000 16999 1351 ********************************************************
-- 17000 17999 864 ************************************
-- 18000 18999 716 *****************************
-- 19000 19999 469 *******************
-- 20000 20999 358 **************
-- 21000 21999 240 **********
-- 22000 22999 172 *******
-- 23000 23999 154 ******
-- 24000 24999 84 ***
-- 25000 25999 75 ***
-- 26000 26999 47 *
-- 27000 27999 36 *
-- 28000 28999 26 *
-- 29000 29999 31 *
-- 30000 30999 16
-- 31000 31999 15
-- 32000 32999 10
-- 33000 33999 8
-- 34000 34999 4
-- 35000 35999 7
-- 36000 36999 3
-- 37000 37999 5
-- 38000 38999 1
-- 39000 39999 0
-- 40000 40999 0
-- 41000 41999 1
-- 42000 42999 1
-- 43000 43999 1
-- 44000 44999 2
-- 45000 45999 1
[UNITIGGING/MERS]
--
-- 22-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 3171948 *******************************************************************--> 0.4714 0.0256
-- 2- 2 277944 *********************** 0.5127 0.0301
-- 3- 4 101105 ******** 0.5236 0.0318
-- 5- 7 23724 * 0.5295 0.0332
-- 8- 11 34847 ** 0.5319 0.0341
-- 12- 16 78306 ****** 0.5382 0.0378
-- 17- 22 246084 ******************** 0.5517 0.0490
-- 23- 29 589075 ************************************************* 0.5938 0.0968
-- 30- 37 838615 ********************************************************************** 0.6873 0.2349
-- 38- 46 828917 ********************************************************************* 0.8101 0.4642
-- 47- 56 361600 ****************************** 0.9297 0.7409
-- 57- 67 97194 ******** 0.9762 0.8701
-- 68- 79 34549 ** 0.9888 0.9124
-- 80- 92 6602 0.9934 0.9307
-- 93- 106 3871 0.9942 0.9346
-- 107- 121 8360 0.9948 0.9377
-- 122- 137 2452 0.9961 0.9458
-- 138- 154 2575 0.9964 0.9479
-- 155- 172 3545 0.9968 0.9512
-- 173- 191 2505 0.9973 0.9557
-- 192- 211 1959 0.9977 0.9596
-- 212- 232 1243 0.9980 0.9624
-- 233- 254 1124 0.9982 0.9647
-- 255- 277 1196 0.9983 0.9669
-- 278- 301 1106 0.9985 0.9696
-- 302- 326 5084 0.9987 0.9720
-- 327- 352 1181 0.9994 0.9853
-- 353- 379 1203 0.9996 0.9884
-- 380- 407 401 0.9998 0.9917
-- 408- 436 199 0.9998 0.9931
-- 437- 466 61 0.9999 0.9937
-- 467- 497 103 0.9999 0.9939
-- 498- 529 34 0.9999 0.9943
-- 530- 562 223 0.9999 0.9944
-- 563- 596 3 0.9999 0.9954
-- 597- 631 66 0.9999 0.9954
-- 632- 667 140 0.9999 0.9958
-- 668- 704 62 0.9999 0.9965
-- 705- 742 3 1.0000 0.9968
-- 743- 781 9 1.0000 0.9969
-- 782- 821 3 1.0000 0.9969
--
-- 29139 (max occurrences)
-- 120873014 (total mers, non-unique)
-- 3557539 (distinct mers, non-unique)
-- 3171948 (unique mers)
[UNITIGGING/OVERLAPS]
-- category reads % read length feature size or coverage analysis
-- ---------------- ------- ------- ---------------------- ------------------------ --------------------
-- middle-missing 4 0.06 10693.75 +- 10181.02 1078.50 +- 1086.73 (bad trimming)
-- middle-hump 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (bad trimming)
-- no-5-prime 1 0.01 5086.00 +- 0.00 1682.00 +- 0.00 (bad trimming)
-- no-3-prime 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (bad trimming)
--
-- low-coverage 3 0.04 2963.33 +- 1357.28 3.31 +- 1.93 (easy to assemble, potential for lower quality consensus)
-- unique 4581 64.87 17538.28 +- 3851.70 34.01 +- 7.79 (easy to assemble, perfect, yay)
-- repeat-cont 138 1.95 15523.88 +- 3433.92 67.99 +- 9.62 (potential for consensus errors, no impact on assembly)
-- repeat-dove 8 0.11 21575.62 +- 940.59 67.49 +- 7.30 (hard to assemble, likely won't assemble correctly or even at all)
--
-- span-repeat 898 12.72 17989.95 +- 3737.95 5671.21 +- 5685.10 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 1166 16.51 16898.89 +- 2729.78 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
-- uniq-repeat-dove 163 2.31 23052.36 +- 4658.13 (will end contigs, potential to misassemble)
-- uniq-anchor 100 1.42 18621.54 +- 4340.77 7597.53 +- 4925.88 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT]
-- No report available.
[UNITIGGING/CONTIGS]
-- Found, in version 1, after unitig construction:
-- contigs: 3 sequences, total length 3455553 bp (including 1 repeats of total length 24019 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 335 sequences, total length 5014160 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2712622 1 2712622
-- 20 2712622 1 2712622
-- 30 2712622 1 2712622
-- 40 2712622 1 2712622
-- 50 2712622 1 2712622
-- 60 2712622 1 2712622
-- 70 2712622 1 2712622
-- 80 2712622 1 2712622
-- 90 718912 2 3431534
-- 100 718912 2 3431534
--
[UNITIGGING/CONSENSUS]
-- Found, in version 2, after consensus generation:
-- contigs: 3 sequences, total length 3449272 bp (including 1 repeats of total length 23378 bp).
-- bubbles: 0 sequences, total length 0 bp.
-- unassembled: 335 sequences, total length 5014154 bp.
--
-- Contig sizes based on genome size --
-- NG (bp) LG (contigs) sum (bp)
-- ---------- ------------ ----------
-- 10 2709159 1 2709159
-- 20 2709159 1 2709159
-- 30 2709159 1 2709159
-- 40 2709159 1 2709159
-- 50 2709159 1 2709159
-- 60 2709159 1 2709159
-- 70 2709159 1 2709159
-- 80 2709159 1 2709159
-- 90 716735 2 3425894
-- 100 716735 2 3425894
--
Any advice as to parameters to change in the assembly?
The previous issue was due to poor input data (the SRA dump command was outputting data not output by the default processing from the sequencer). No parameter changes were necessary.
The assembly depends on repeats present in the genome. From your assembly output, it seems there is a large (>22kbp) repeat that is most likely not spanned. The PacBio control sequence is only 2kb, your shortest contig, which is a repeat, is >20kb. You have few reads longer than this so unless the repeat is diverged enough it won't be resolved no matter how much sequencing coverage you add unless you get longer reads. The GFA output should give more information on how the ambiguity from the repeat. You could try reducing the error rate used for unitigging to see if you could resolve the repeat (see the heterezygous genome parameters on the wiki) and if you share your data we could try it locally but it is possible the repeat is too large to be resolved by your data.
Just for curiosity -- how did you find out that there exists a large repeat and the length of it? I do see some portion of span and unique repeat but not sure exactly how to interpret it.
-- span-repeat 898 12.72 17989.95 +- 3737.95 5671.21 +- 5685.10 (read spans a large repeat, usually easy to assemble)
-- uniq-repeat-cont 1166 16.51 16898.89 +- 2729.78 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
This is also a general problem when I use Canu -- not able to fully interpret the report; or not sure which parts of the report should I primly look (I usually focus on the X times coverage
after each step and [UNITIGGING/OVERLAPS]
stats). What do you recommend to look at in the report as a general check for data/assembly evaluation and is there good reference papers?
Sorry for interrupting this thread, I am new to assembly. Any information is appreciated.
The report is primarily so we can diagnose issues with users assemblies. Typically, check that the corrected/trimmed coverage is sufficient for assembly (e.g. 30x or higher) and that there is a peak in the k-mer counts post-correction at close to that coverage. For the repeat, the repeat stats:
-- repeat-cont 138 1.95 15523.88 +- 3433.92 67.99 +- 9.62 (potential for consensus errors, no impact on assembly)
are reads which are contained in a repeat. They are 15kb+. There is also:
-- uniq-repeat-cont 1166 16.51 16898.89 +- 2729.78 (should be uniquely placed, low potential for consensus errors, no impact on assembly)
which are reads where one end is unique and one end is repetitive (e.g. a junction read) but also contained in another longer sequence. There is also the unitigging output:
-- contigs: 3 sequences, total length 3455553 bp (including 1 repeats of total length 24019 bp).
so there is a repeat contig (by coverage) of >20kb and the overlap stats above indicate the presence of a large repeat which is why I would guess this genome has a repeat >22kbp.
I just BLAST the 22k contig, and it comes back matching another strain of the same organism. It is very likely there are repetitive segments of the genome, since this particular genus of bacterium is known for secreting a lot of proteins and often has repeated secretory system elements throughout the genome. Additionally, these organisms often contain multiple rRNA copies.
Is there an empirical way to determine how to adjust the utgErrorRate for this particular dataset?
@skoren This is really helpful. Thank you!
I have been trying to assemble a bacteria genome of 2.5mb with a high coverage 1800X PacBio data, but had no luck. I tried a few parameter suggested, None of them gave me ideal continous contigs.
Attempts:
corOutCoverage=100
- it gave me sufficient corrected reads (91X), but discontinuous contigs (67 sequences, total length 2696826 bp including 6 repeats of total length 198541 bp).corMhapSensitivity=high corMinCoverage=0
- coverage is also ok (40X), but contigs (total length 246496bp) are only 10% coverage compared to genome size (2.5mb)corOutCoverage=500 ovlErrorRate=0.15 obtErrorRate=0.15
to smash hyplotype. This gave me crazy number of contigs (774 sequences, total length 28825721 bp including 180 repeats of total length 10087773 bp) and ran over two days.....I am rerunning Canu with just corOutCoverage=200 now. In the meantime, I am questioning if my samples is contaminated..... any suggestions?
Thank you!!
The report for attempts 1. 2. 3. are below
[CORRECTION/MERS]
-- 16-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 590490024 *****--> 0.3880 0.1283 -- 2- 2 384381114 ** 0.6406 0.2953 -- 3- 4 344538198 ** 0.7861 0.4397 -- 5- 7 132698608 **** 0.9122 0.6215 -- 8- 11 39138405 * 0.9643 0.7368 -- 12- 16 14968864 * 0.9827 0.7986 -- 17- 22 6729681 0.9907 0.8374 -- 23- 29 3222241 0.9945 0.8625 -- 30- 37 1586497 0.9964 0.8789 -- 38- 46 778504 0.9973 0.8895 -- 47- 56 376394 0.9978 0.8960 -- 57- 67 189576 0.9980 0.8999 -- 68- 79 117437 0.9982 0.9023 -- 80- 92 131456 0.9982 0.9042 -- 93- 106 236023 0.9983 0.9068 -- 107- 121 411456 0.9985 0.9121 -- 122- 137 568627 0.9988 0.9227 -- 138- 154 570369 0.9991 0.9390 -- 155- 172 398398 0.9995 0.9569 -- 173- 191 193149 0.9998 0.9706 -- 192- 211 72915 0.9999 0.9778 -- 212- 232 28822 0.9999 0.9808 -- 233- 254 16500 0.9999 0.9822 -- 255- 277 12514 1.0000 0.9830 -- 278- 301 9921 1.0000 0.9838 -- 302- 326 7569 1.0000 0.9844 -- 327- 352 5747 1.0000 0.9849 -- 353- 379 4418 1.0000 0.9853 -- 380- 407 3747 1.0000 0.9857 -- 408- 436 3033 1.0000 0.9860 -- 437- 466 2442 1.0000 0.9862 -- 467- 497 1792 1.0000 0.9865 -- 498- 529 1390 1.0000 0.9867 -- 530- 562 1082 1.0000 0.9868 -- 563- 596 862 1.0000 0.9870 -- 597- 631 680 1.0000 0.9871 -- 632- 667 618 1.0000 0.9871 -- 668- 704 527 1.0000 0.9872 -- 705- 742 530 1.0000 0.9873 -- 743- 781 571 1.0000 0.9874 -- 782- 821 627 1.0000 0.9875
-- 6187952 (max occurrences) -- 4012007789 (total mers, non-unique) -- 931423950 (distinct mers, non-unique) -- 590490024 (unique mers)
[CORRECTION/CORRECTIONS]
-- Reads to be corrected: -- 7742 reads longer than 43630 bp -- 280804813 bp -- Expected corrected reads: -- 7742 reads -- 249007801 bp -- 26861 bp minimum length -- 32163 bp mean length -- 45987 bp n50 length
[TRIMMING/READS]
-- In gatekeeper store 'trimming/hc1_hybrid_cov100.gkpStore': -- Found 7838 reads. -- Found 227633046 bases (91.41 times coverage).
-- Read length histogram (one '*' equals 10.9 reads): -- 0 999 0 -- 1000 1999 92 **** -- 2000 2999 42 * -- 3000 3999 31 -- 4000 4999 19 -- 5000 5999 12 -- 6000 6999 8 -- 7000 7999 4 -- 8000 8999 7 -- 9000 9999 5 -- 10000 10999 5 -- 11000 11999 7 -- 12000 12999 3 -- 13000 13999 5 -- 14000 14999 6 -- 15000 15999 7 -- 16000 16999 11 -- 17000 17999 14 -- 18000 18999 28 -- 19000 19999 20 * -- 20000 20999 23 -- 21000 21999 52 -- 22000 22999 73 ** -- 23000 23999 117 ** -- 24000 24999 223 **** -- 25000 25999 457 * -- 26000 26999 746 **** -- 27000 27999 722 ** -- 28000 28999 763 ** -- 29000 29999 746 **** -- 30000 30999 658 **** -- 31000 31999 680 ** -- 32000 32999 606 *** -- 33000 33999 581 *** -- 34000 34999 432 * -- 35000 35999 294 ** -- 36000 36999 195 *** -- 37000 37999 67 ** -- 38000 38999 38 * -- 39000 39999 16 * -- 40000 40999 6 -- 41000 41999 4 -- 42000 42999 3 -- 43000 43999 0 -- 44000 44999 1 -- 45000 45999 2 -- 46000 46999 4 -- 47000 47999 2 -- 48000 48999 0 -- 49000 49999 0 -- 50000 50999 0 -- 51000 51999 1
[TRIMMING/MERS]
-- 22-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 21250603 *****--> 0.8027 0.0934 -- 2- 2 1605147 ** 0.8633 0.1075 -- 3- 4 793634 ** 0.8838 0.1147 -- 5- 7 257946 * 0.8983 0.1220 -- 8- 11 77877 0.9042 0.1266 -- 12- 16 23408 0.9063 0.1290 -- 17- 22 10086 0.9069 0.1302 -- 23- 29 28710 * 0.9073 0.1311 -- 30- 37 65284 0.9085 0.1349 -- 38- 46 143680 **** 0.9113 0.1461 -- 47- 56 241228 ** 0.9169 0.1747 -- 57- 67 405156 *** 0.9265 0.2335 -- 68- 79 496940 ** 0.9423 0.3499 -- 80- 92 447531 0.9609 0.5112 -- 93- 106 328006 ** 0.9774 0.6773 -- 107- 121 176200 **** 0.9893 0.8164 -- 122- 137 87187 0.9957 0.9004 -- 138- 154 22102 0.9988 0.9467 -- 155- 172 5221 0.9995 0.9593 -- 173- 191 3770 0.9997 0.9630 -- 192- 211 1365 0.9998 0.9658 -- 212- 232 198 0.9999 0.9669 -- 233- 254 100 0.9999 0.9671 -- 255- 277 81 0.9999 0.9672 -- 278- 301 52 0.9999 0.9673 -- 302- 326 50 0.9999 0.9674 -- 327- 352 42 0.9999 0.9675 -- 353- 379 96 0.9999 0.9675 -- 380- 407 46 0.9999 0.9677 -- 408- 436 96 0.9999 0.9678 -- 437- 466 27 0.9999 0.9679 -- 467- 497 19 0.9999 0.9680 -- 498- 529 19 0.9999 0.9680 -- 530- 562 30 0.9999 0.9681 -- 563- 596 34 0.9999 0.9681 -- 597- 631 8 0.9999 0.9682 -- 632- 667 4 0.9999 0.9683 -- 668- 704 4 0.9999 0.9683 -- 705- 742 4 0.9999 0.9683 -- 743- 781 5 0.9999 0.9683 -- 782- 821 1 0.9999 0.9683
-- 874625 (max occurrences) -- 206217845 (total mers, non-unique) -- 5223583 (distinct mers, non-unique) -- 21250603 (unique mers)
[TRIMMING/TRIMMING] -- PARAMETERS:
-- 1000 (reads trimmed below this many bases are deleted) -- 0.1440 (use overlaps at or below this fraction error) -- 1 (break region if overlap is less than this long, for 'largest covered' algorithm) -- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm) --
-- INPUT READS:
-- 7838 reads 227633046 bases (reads processed) -- 0 reads 0 bases (reads not processed, previously deleted) -- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed) --
-- OUTPUT READS:
-- 7165 reads 174112663 bases (trimmed reads output) -- 661 reads 17126484 bases (reads with no change, kept as is) -- 10 reads 208473 bases (reads with no overlaps, deleted) -- 2 reads 24621 bases (reads with short trimmed length, deleted) --
-- TRIMMING DETAILS:
-- 5356 reads 18353687 bases (bases trimmed from the 5' end of a read) -- 6178 reads 17807118 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING] -- PARAMETERS:
-- 1000 (reads trimmed below this many bases are deleted) -- 0.1440 (use overlaps at or below this fraction error) -- INPUT READS:
-- 7826 reads 227399952 bases (reads processed) -- 12 reads 233094 bases (reads not processed, previously deleted) -- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed) --
-- PROCESSED:
-- 0 reads 0 bases (no overlaps) -- 0 reads 0 bases (no coverage after adjusting for trimming done already) -- 0 reads 0 bases (processed for chimera) -- 0 reads 0 bases (processed for spur) -- 7826 reads 227399952 bases (processed for subreads) --
-- READS WITH SIGNALS:
-- 0 reads 0 signals (number of 5' spur signal) -- 0 reads 0 signals (number of 3' spur signal) -- 0 reads 0 signals (number of chimera signal) -- 4405 reads 5109 signals (number of subread signal) --
-- SIGNALS:
-- 0 reads 0 bases (size of 5' spur signal) -- 0 reads 0 bases (size of 3' spur signal) -- 0 reads 0 bases (size of chimera signal) -- 5109 reads 419488 bases (size of subread signal) --
-- TRIMMING:
-- 2258 reads 23632933 bases (trimmed from the 5' end of the read) -- 2515 reads 26453072 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
-- In gatekeeper store 'unitigging/hc1_hybrid_cov100.gkpStore': -- Found 7826 reads. -- Found 141153142 bases (56.68 times coverage).
-- Read length histogram (one '*' equals 11.61 reads): -- 0 999 0 -- 1000 1999 100 **** -- 2000 2999 42 -- 3000 3999 30 -- 4000 4999 33 -- 5000 5999 31 -- 6000 6999 26 -- 7000 7999 36 -- 8000 8999 441 ** -- 9000 9999 413 -- 10000 10999 387 ***** -- 11000 11999 327 **** -- 12000 12999 122 ** -- 13000 13999 199 * -- 14000 14999 331 **** -- 15000 15999 443 ** -- 16000 16999 763 ***** -- 17000 17999 813 ** -- 18000 18999 529 * -- 19000 19999 302 ** -- 20000 20999 246 *** -- 21000 21999 194 **** -- 22000 22999 127 ** -- 23000 23999 123 ** -- 24000 24999 184 * -- 25000 25999 281 **** -- 26000 26999 240 **** -- 27000 27999 194 **** -- 28000 28999 134 -- 29000 29999 139 -- 30000 30999 93 **** -- 31000 31999 108 * -- 32000 32999 104 ** -- 33000 33999 88 * -- 34000 34999 72 ** -- 35000 35999 48 ** -- 36000 36999 34 -- 37000 37999 14 -- 38000 38999 13 -- 39000 39999 9 -- 40000 40999 4 -- 41000 41999 3 -- 42000 42999 1 -- 43000 43999 1 -- 44000 44999 0 -- 45000 45999 2 -- 46000 46999 1 -- 47000 47999 1
[UNITIGGING/MERS]
-- 22-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 10093919 *****--> 0.7205 0.0716 -- 2- 2 898914 ** 0.7847 0.0843 -- 3- 4 406953 * 0.8050 0.0904 -- 5- 7 113768 **** 0.8181 0.0961 -- 8- 11 30595 0.8227 0.0991 -- 12- 16 34857 0.8244 0.1008 -- 17- 22 61921 0.8271 0.1048 -- 23- 29 179269 * 0.8323 0.1157 -- 30- 37 368195 **** 0.8461 0.1532 -- 38- 46 562333 *** 0.8743 0.2506 -- 47- 56 542176 ** 0.9143 0.4217 -- 57- 67 401443 *** 0.9522 0.6185 -- 68- 79 209573 **** 0.9793 0.7865 -- 80- 92 75165 * 0.9934 0.8897 -- 93- 106 17688 * 0.9980 0.9288 -- 107- 121 5882 0.9992 0.9405 -- 122- 137 2967 0.9996 0.9449 -- 138- 154 393 0.9998 0.9473 -- 155- 172 145 0.9998 0.9477 -- 173- 191 98 0.9998 0.9479 -- 192- 211 94 0.9998 0.9480 -- 212- 232 56 0.9998 0.9481 -- 233- 254 29 0.9998 0.9482 -- 255- 277 66 0.9998 0.9483 -- 278- 301 42 0.9998 0.9484 -- 302- 326 31 0.9998 0.9485 -- 327- 352 43 0.9998 0.9485 -- 353- 379 79 0.9998 0.9486 -- 380- 407 52 0.9998 0.9488 -- 408- 436 57 0.9998 0.9490 -- 437- 466 18 0.9998 0.9492 -- 467- 497 19 0.9998 0.9492 -- 498- 529 11 0.9998 0.9493 -- 530- 562 30 0.9998 0.9493 -- 563- 596 29 0.9998 0.9494 -- 597- 631 15 0.9998 0.9496 -- 632- 667 3 0.9998 0.9496 -- 668- 704 5 0.9998 0.9496 -- 705- 742 9 0.9998 0.9497 -- 743- 781 12 0.9998 0.9497 -- 782- 821 6 0.9998 0.9498
-- 843946 (max occurrences) -- 130894877 (total mers, non-unique) -- 3915192 (distinct mers, non-unique) -- 10093919 (unique mers)
[UNITIGGING/OVERLAPS] -- category reads % read length feature size or coverage analysis
-- middle-missing 489 6.25 29070.37 +- 5639.63 3275.87 +- 3732.77 (bad trimming) -- middle-hump 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (bad trimming) -- no-5-prime 1 0.01 39957.00 +- 0.00 16246.00 +- 0.00 (bad trimming) -- no-3-prime 1 0.01 27989.00 +- 0.00 1467.00 +- 0.00 (bad trimming) --
-- low-coverage 11 0.14 18916.27 +- 9142.20 4.71 +- 2.11 (easy to assemble, potential for lower quality consensus) -- unique 2968 37.92 13454.04 +- 4834.91 23.83 +- 8.89 (easy to assemble, perfect, yay) -- repeat-cont 192 2.45 18707.28 +- 12170.97 248.06 +- 97.64 (potential for consensus errors, no impact on assembly) -- repeat-dove 25 0.32 37392.24 +- 2983.29 221.19 +- 80.86 (hard to assemble, likely won't assemble correctly or even at all) --
-- span-repeat 3590 45.87 20157.09 +- 5795.09 7406.85 +- 5937.53 (read spans a large repeat, usually easy to assemble) -- uniq-repeat-cont 276 3.53 12146.35 +- 6426.16 (should be uniquely placed, low potential for consensus errors, no impact on assembly) -- uniq-repeat-dove 252 3.22 24134.07 +- 7879.99 (will end contigs, potential to misassemble) -- uniq-anchor 21 0.27 19304.67 +- 11838.94 4755.52 +- 3217.80 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT] -- No report available.
[UNITIGGING/CONTIGS] -- Found, in version 1, after unitig construction: -- contigs: 67 sequences, total length 2626599 bp (including 6 repeats of total length 197551 bp). -- bubbles: 0 sequences, total length 0 bp. -- unassembled: 5621 sequences, total length 109001061 bp.
-- Contig sizes based on genome size -- -- NG (bp) LG (contigs) sum (bp)
-- 10 75162 3 266606 -- 20 64125 7 533093 -- 30 54400 11 770426 -- 40 49697 16 1024873 -- 50 45684 21 1262405 -- 60 41653 27 1522313 -- 70 35424 33 1749108 -- 80 31548 41 2015962 -- 90 27723 49 2252104 -- 100 19148 60 2505107
[UNITIGGING/CONSENSUS] -- Found, in version 2, after consensus generation: -- contigs: 67 sequences, total length 2696826 bp (including 6 repeats of total length 198541 bp). -- bubbles: 0 sequences, total length 0 bp. -- unassembled: 5621 sequences, total length 109152481 bp.
-- Contig sizes based on genome size -- -- NG (bp) LG (contigs) sum (bp)
-- 10 83110 3 286850 -- 20 70479 6 506173 -- 30 60435 10 763059 -- 40 53599 15 1038577 -- 50 46099 20 1279489 -- 60 42838 25 1501972 -- 70 36509 32 1775771 -- 80 33939 39 2020245 -- 90 27940 47 2258007 -- 100 22568 56 2490408
[CORRECTION/MERS]
-- 16-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 594082432 *****--> 0.3924 0.1322 -- 2- 2 383772344 ** 0.6459 0.3031 -- 3- 4 340411799 ** 0.7908 0.4496 -- 5- 7 129208131 * 0.9152 0.6323 -- 8- 11 37410055 ** 0.9660 0.7466 -- 12- 16 13968007 * 0.9836 0.8068 -- 17- 22 6329618 0.9911 0.8437 -- 23- 29 3098351 0.9947 0.8681 -- 30- 37 1534876 0.9965 0.8843 -- 38- 46 748727 0.9974 0.8947 -- 47- 56 359020 0.9978 0.9012 -- 57- 67 178536 0.9981 0.9050 -- 68- 79 110558 0.9982 0.9073 -- 80- 92 127503 0.9982 0.9091 -- 93- 106 234077 0.9983 0.9117 -- 107- 121 411794 0.9985 0.9171 -- 122- 137 568842 0.9988 0.9280 -- 138- 154 567957 0.9992 0.9447 -- 155- 172 392933 0.9995 0.9629 -- 173- 191 188465 0.9998 0.9767 -- 192- 211 70309 0.9999 0.9840 -- 212- 232 27625 0.9999 0.9870 -- 233- 254 15821 1.0000 0.9883 -- 255- 277 11747 1.0000 0.9891 -- 278- 301 8990 1.0000 0.9898 -- 302- 326 6600 1.0000 0.9904 -- 327- 352 4922 1.0000 0.9908 -- 353- 379 3846 1.0000 0.9912 -- 380- 407 3402 1.0000 0.9915 -- 408- 436 2734 1.0000 0.9918 -- 437- 466 2168 1.0000 0.9921 -- 467- 497 1596 1.0000 0.9923 -- 498- 529 1202 1.0000 0.9924 -- 530- 562 913 1.0000 0.9926 -- 563- 596 701 1.0000 0.9927 -- 597- 631 555 1.0000 0.9928 -- 632- 667 476 1.0000 0.9929 -- 668- 704 420 1.0000 0.9929 -- 705- 742 448 1.0000 0.9930 -- 743- 781 474 1.0000 0.9931 -- 782- 821 527 1.0000 0.9931
-- 6187952 (max occurrences) -- 3898198967 (total mers, non-unique) -- 919794736 (distinct mers, non-unique) -- 594082432 (unique mers)
[CORRECTION/CORRECTIONS]
-- Reads to be corrected: -- 57007 reads longer than 30840 bp -- 1601121265 bp -- Expected corrected reads: -- 57007 reads -- 1245006954 bp -- 11879 bp minimum length -- 21840 bp mean length -- 46654 bp n50 length
[TRIMMING/READS]
-- In gatekeeper store 'trimming/hc1_pacbio_smash.gkpStore': -- Found 62585 reads. -- Found 953598509 bases (382.97 times coverage).
-- Read length histogram (one '*' equals 57.91 reads): -- 0 999 0 -- 1000 1999 1669 **** -- 2000 2999 1678 **** -- 3000 3999 1674 **** -- 4000 4999 1672 **** -- 5000 5999 1767 ** -- 6000 6999 1854 **** -- 7000 7999 2040 * -- 8000 8999 2306 ***** -- 9000 9999 2470 ** -- 10000 10999 2888 * -- 11000 11999 4054 ** -- 12000 12999 3517 **** -- 13000 13999 3225 *** -- 14000 14999 3161 ** -- 15000 15999 2779 ** -- 16000 16999 2657 -- 17000 17999 2396 ***** -- 18000 18999 2124 **** -- 19000 19999 1716 * -- 20000 20999 1678 **** -- 21000 21999 1664 **** -- 22000 22999 1667 **** -- 23000 23999 1573 ** -- 24000 24999 1480 -- 25000 25999 1412 **** -- 26000 26999 1256 *** -- 27000 27999 1052 ** -- 28000 28999 902 * -- 29000 29999 852 ** -- 30000 30999 718 **** -- 31000 31999 683 -- 32000 32999 643 -- 33000 33999 501 **** -- 34000 34999 367 ** -- 35000 35999 241 ** -- 36000 36999 133 -- 37000 37999 52 -- 38000 38999 26 -- 39000 39999 17 -- 40000 40999 8 -- 41000 41999 4 -- 42000 42999 1 -- 43000 43999 0 -- 44000 44999 1 -- 45000 45999 0 -- 46000 46999 5 -- 47000 47999 2
[TRIMMING/MERS]
-- 22-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 55231268 *****--> 0.8492 0.0580 -- 2- 2 3685405 ** 0.9059 0.0657 -- 3- 4 2117182 **** 0.9274 0.0701 -- 5- 7 882823 **** 0.9449 0.0754 -- 8- 11 365315 ** 0.9540 0.0796 -- 12- 16 161397 * 0.9583 0.0826 -- 17- 22 73227 0.9604 0.0846 -- 23- 29 34139 0.9613 0.0859 -- 30- 37 16278 0.9618 0.0868 -- 38- 46 8318 0.9620 0.0873 -- 47- 56 4738 0.9621 0.0876 -- 57- 67 2955 0.9622 0.0879 -- 68- 79 1805 0.9622 0.0881 -- 80- 92 1434 0.9623 0.0882 -- 93- 106 964 0.9623 0.0883 -- 107- 121 926 0.9623 0.0884 -- 122- 137 712 0.9623 0.0885 -- 138- 154 2170 0.9623 0.0886 -- 155- 172 6610 0.9624 0.0890 -- 173- 191 8184 0.9625 0.0902 -- 192- 211 13762 0.9626 0.0918 -- 212- 232 30966 0.9628 0.0947 -- 233- 254 76172 0.9633 0.1024 -- 255- 277 139178 0.9645 0.1226 -- 278- 301 224652 ** 0.9667 0.1628 -- 302- 326 350000 ** 0.9702 0.2328 -- 327- 352 448360 **** 0.9757 0.3503 -- 353- 379 465122 **** 0.9826 0.5111 -- 380- 407 361448 ** 0.9897 0.6896 -- 408- 436 215609 *** 0.9952 0.8367 -- 437- 466 76080 0.9984 0.9294 -- 467- 497 8483 0.9995 0.9635 -- 498- 529 457 0.9996 0.9673 -- 530- 562 716 0.9997 0.9675 -- 563- 596 2053 0.9997 0.9680 -- 597- 631 4324 0.9997 0.9692 -- 632- 667 4047 0.9998 0.9721 -- 668- 704 2073 0.9998 0.9748 -- 705- 742 2174 0.9999 0.9763 -- 743- 781 3307 0.9999 0.9780 -- 782- 821 1127 0.9999 0.9806
-- 5458608 (max occurrences) -- 897052951 (total mers, non-unique) -- 9807648 (distinct mers, non-unique) -- 55231268 (unique mers)
[TRIMMING/TRIMMING] -- PARAMETERS:
-- 1000 (reads trimmed below this many bases are deleted) -- 0.1500 (use overlaps at or below this fraction error) -- 1 (break region if overlap is less than this long, for 'largest covered' algorithm) -- 1 (break region if overlap coverage is less than this many read, for 'largest covered' algorithm) --
-- INPUT READS:
-- 62585 reads 953598509 bases (reads processed) -- 0 reads 0 bases (reads not processed, previously deleted) -- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed) --
-- OUTPUT READS:
-- 54254 reads 797345148 bases (trimmed reads output) -- 8195 reads 116372849 bases (reads with no change, kept as is) -- 89 reads 462733 bases (reads with no overlaps, deleted) -- 47 reads 79304 bases (reads with short trimmed length, deleted) --
-- TRIMMING DETAILS:
-- 38763 reads 23269896 bases (bases trimmed from the 5' end of a read) -- 42441 reads 16068579 bases (bases trimmed from the 3' end of a read)
[TRIMMING/SPLITTING] -- PARAMETERS:
-- 1000 (reads trimmed below this many bases are deleted) -- 0.1500 (use overlaps at or below this fraction error) -- INPUT READS:
-- 62449 reads 953056472 bases (reads processed) -- 136 reads 542037 bases (reads not processed, previously deleted) -- 0 reads 0 bases (reads not processed, in a library where trimming isn't allowed) --
-- PROCESSED:
-- 0 reads 0 bases (no overlaps) -- 0 reads 0 bases (no coverage after adjusting for trimming done already) -- 0 reads 0 bases (processed for chimera) -- 0 reads 0 bases (processed for spur) -- 62449 reads 953056472 bases (processed for subreads) --
-- READS WITH SIGNALS:
-- 0 reads 0 signals (number of 5' spur signal) -- 0 reads 0 signals (number of 3' spur signal) -- 0 reads 0 signals (number of chimera signal) -- 19659 reads 21848 signals (number of subread signal) --
-- SIGNALS:
-- 0 reads 0 bases (size of 5' spur signal) -- 0 reads 0 bases (size of 3' spur signal) -- 0 reads 0 bases (size of chimera signal) -- 21848 reads 6289853 bases (size of subread signal) --
-- TRIMMING:
-- 8938 reads 53461841 bases (trimmed from the 5' end of the read) -- 12082 reads 64183872 bases (trimmed from the 3' end of the read)
[UNITIGGING/READS]
-- In gatekeeper store 'unitigging/hc1_pacbio_smash.gkpStore': -- Found 62449 reads. -- Found 796072284 bases (319.7 times coverage).
-- Read length histogram (one '*' equals 81.4 reads): -- 0 999 0 -- 1000 1999 1775 * -- 2000 2999 1881 *** -- 3000 3999 2033 **** -- 4000 4999 2412 * -- 5000 5999 2855 *** -- 6000 6999 3166 ** -- 7000 7999 3519 *** -- 8000 8999 5698 ** -- 9000 9999 3949 **** -- 10000 10999 3475 ** -- 11000 11999 3025 * -- 12000 12999 2020 **** -- 13000 13999 2243 * -- 14000 14999 2434 *** -- 15000 15999 2546 *** -- 16000 16999 2939 **** -- 17000 17999 2778 ** -- 18000 18999 2051 * -- 19000 19999 1491 ** -- 20000 20999 1391 ** -- 21000 21999 1278 -- 22000 22999 1054 **** -- 23000 23999 941 -- 24000 24999 900 -- 25000 25999 911 ** -- 26000 26999 733 -- 27000 27999 595 ** -- 28000 28999 464 -- 29000 29999 437 * -- 30000 30999 342 ** -- 31000 31999 319 * -- 32000 32999 258 -- 33000 33999 206 -- 34000 34999 132 -- 35000 35999 90 * -- 36000 36999 46 -- 37000 37999 19 -- 38000 38999 14 -- 39000 39999 12 -- 40000 40999 6 -- 41000 41999 5 -- 42000 42999 1 -- 43000 43999 1 -- 44000 44999 0 -- 45000 45999 2 -- 46000 46999 1 -- 47000 47999 1
[UNITIGGING/MERS]
-- 22-mers Fraction -- Occurrences NumMers Unique Total -- 1- 1 30278566 *****--> 0.7935 0.0381 -- 2- 2 2785278 ** 0.8665 0.0451 -- 3- 4 1568706 * 0.8940 0.0491 -- 5- 7 627265 ***** 0.9156 0.0536 -- 8- 11 248535 ** 0.9265 0.0571 -- 12- 16 106525 0.9314 0.0595 -- 17- 22 46308 0.9337 0.0611 -- 23- 29 20091 0.9347 0.0621 -- 30- 37 10535 0.9352 0.0627 -- 38- 46 5740 0.9354 0.0631 -- 47- 56 3196 0.9356 0.0634 -- 57- 67 1654 0.9357 0.0636 -- 68- 79 1311 0.9357 0.0637 -- 80- 92 1275 0.9357 0.0638 -- 93- 106 1533 0.9358 0.0640 -- 107- 121 6774 0.9358 0.0642 -- 122- 137 10652 0.9360 0.0652 -- 138- 154 10920 0.9363 0.0670 -- 155- 172 23410 0.9366 0.0690 -- 173- 191 56187 0.9372 0.0742 -- 192- 211 91811 0.9387 0.0875 -- 212- 232 150802 * 0.9412 0.1118 -- 233- 254 227477 * 0.9452 0.1550 -- 255- 277 325041 **** 0.9512 0.2258 -- 278- 301 383536 ***** 0.9599 0.3367 -- 302- 326 395184 *** 0.9700 0.4773 -- 327- 352 337569 **** 0.9803 0.6329 -- 353- 379 243903 ** 0.9890 0.7757 -- 380- 407 121281 * 0.9953 0.8859 -- 408- 436 39520 0.9984 0.9439 -- 437- 466 3670 0.9993 0.9636 -- 467- 497 1727 0.9994 0.9655 -- 498- 529 2184 0.9995 0.9665 -- 530- 562 4971 0.9995 0.9680 -- 563- 596 2437 0.9997 0.9714 -- 597- 631 1985 0.9997 0.9731 -- 632- 667 2579 0.9998 0.9747 -- 668- 704 2665 0.9998 0.9768 -- 705- 742 650 0.9999 0.9791 -- 743- 781 182 0.9999 0.9796 -- 782- 821 118 0.9999 0.9798
-- 4625539 (max occurrences) -- 764482287 (total mers, non-unique) -- 7877581 (distinct mers, non-unique) -- 30278566 (unique mers)
[UNITIGGING/OVERLAPS] -- category reads % read length feature size or coverage analysis
-- middle-missing 46 0.07 29493.83 +- 7307.73 1491.52 +- 1824.50 (bad trimming) -- middle-hump 0 0.00 0.00 +- 0.00 0.00 +- 0.00 (bad trimming) -- no-5-prime 1 0.00 41272.00 +- 0.00 17545.00 +- 0.00 (bad trimming) -- no-3-prime 2 0.00 18903.00 +- 7237.95 14211.00 +- 9907.98 (bad trimming) --
-- low-coverage 1 0.00 41272.00 +- 0.00 6.16 +- 1.70 (easy to assemble, potential for lower quality consensus) -- unique 8 0.01 2984.62 +- 1054.95 30.17 +- 9.86 (easy to assemble, perfect, yay) -- repeat-cont 48546 77.74 10090.60 +- 5030.80 173.61 +- 79.98 (potential for consensus errors, no impact on assembly) -- repeat-dove 2835 4.54 19077.12 +- 3669.65 123.21 +- 51.63 (hard to assemble, likely won't assemble correctly or even at all) --
-- span-repeat 31 0.05 21499.13 +- 11354.69 12121.97 +- 7868.00 (read spans a large repeat, usually easy to assemble) -- uniq-repeat-cont 277 0.44 15073.83 +- 7756.64 (should be uniquely placed, low potential for consensus errors, no impact on assembly) -- uniq-repeat-dove 788 1.26 23463.74 +- 6199.11 (will end contigs, potential to misassemble) -- uniq-anchor 9914 15.88 22927.00 +- 6416.16 9132.49 +- 6470.79 (repeat read, with unique section, probable bad read)
[UNITIGGING/ADJUSTMENT] -- No report available.
[UNITIGGING/CONTIGS] -- Found, in version 1, after unitig construction: -- contigs: 774 sequences, total length 28825721 bp (including 180 repeats of total length 10087773 bp). -- bubbles: 0 sequences, total length 0 bp. -- unassembled: 27967 sequences, total length 467652040 bp.
-- Contig sizes based on genome size -- -- NG (bp) LG (contigs) sum (bp)
-- 10 154794 2 338981 -- 20 148422 4 637556 -- 30 145749 5 783305 -- 40 138518 7 1062580 -- 50 130357 9 1328482 -- 60 129479 11 1588102 -- 70 126396 13 1841286 -- 80 124727 15 2090918 -- 90 122469 17 2336194 -- 100 113677 19 2564092 -- 110 108082 21 2783983 -- 120 104973 23 2994243 -- 130 101000 26 3301631 -- 140 100034 28 3502455 -- 150 97748 31 3798249 -- 160 94684 33 3990519 -- 170 92653 36 4269074 -- 180 90201 39 4542453 -- 190 88536 42 4810047 -- 200 87275 44 4985097 -- 210 86877 47 5245906 -- 220 85192 50 5501815 -- 230 82650 53 5750599 -- 240 80918 56 5995944 -- 250 79379 59 6234829 -- 260 77458 63 6546969 -- 270 74500 66 6772693 -- 280 73254 69 6994323 -- 290 71418 73 7283288 -- 300 70495 76 7495508 -- 310 69738 80 7776030 -- 320 68511 83 7982794 -- 330 67627 87 8254692 -- 340 65652 91 8521122 -- 350 64665 94 8716320 -- 360 64072 98 8973481 -- 370 62565 102 9226959 -- 380 61543 106 9474586 -- 390 60067 110 9717579 -- 400 59586 115 10016434 -- 410 57157 119 10250579 -- 420 56041 123 10476983 -- 430 55078 128 10754430 -- 440 53690 132 10970991 -- 450 52979 137 11237718 -- 460 52262 142 11500811 -- 470 51627 146 11707805 -- 480 50988 151 11963946 -- 490 50277 156 12216841 -- 500 49237 161 12464805 -- 510 48306 166 12707304 -- 520 47671 172 12995191 -- 530 46893 177 13231443 -- 540 46407 182 13464206 -- 550 45829 188 13740551 -- 560 45420 193 13968772 -- 570 44801 198 14193995 -- 580 44345 204 14461249 -- 590 43386 210 14723495 -- 600 41842 216 14977929 -- 610 41405 222 15227512 -- 620 40677 228 15472925 -- 630 40161 234 15714745 -- 640 39543 240 15953710 -- 650 38616 246 16187173 -- 660 37911 253 16454724 -- 670 37122 260 16717158 -- 680 36850 266 16939049 -- 690 35966 273 17193483 -- 700 35505 280 17443357 -- 710 34914 287 17689227 -- 720 34494 294 17932188 -- 730 33858 302 18204752 -- 740 33547 309 18440489 -- 750 33049 317 18706818 -- 760 32763 324 18936825 -- 770 32468 332 19197593 -- 780 32108 339 19423522 -- 790 31523 347 19677608 -- 800 31078 355 19927805 -- 810 30665 363 20174634 -- 820 30213 371 20418218 -- 830 29594 380 20686317 -- 840 29183 388 20921491 -- 850 28756 397 21182694 -- 860 28141 406 21437913 -- 870 27777 415 21689494 -- 880 27250 424 21936525 -- 890 26935 433 22180422 -- 900 26516 442 22420465 -- 910 26063 452 22683561 -- 920 25788 461 22916707 -- 930 25301 471 23172223 -- 940 24787 481 23423021 -- 950 24486 491 23669106 -- 960 24030 501 23911583 -- 970 23406 512 24173091 -- 980 22623 522 24403028 -- 990 22053 534 24670704 -- 000 21815 545 24911957 -- 010 21516 556 25150516 -- 020 20860 568 25404468 -- 030 20227 580 25650906 -- 040 19854 593 25911110 -- 050 19684 605 26148260 -- 060 19463 618 26402836 -- 070 18856 631 26651276 -- 080 18384 644 26893024 -- 090 18232 658 27149058 -- 100 17459 672 27397834 -- 110 17102 686 27639666 -- 120 16584 701 27892555 -- 130 16070 717 28152959 -- 140 15404 732 28389103 -- 150 11744 750 28639894
[UNITIGGING/CONSENSUS] -- Found, in version 2, after consensus generation: -- contigs: 774 sequences, total length 28496592 bp (including 180 repeats of total length 9776968 bp). -- bubbles: 0 sequences, total length 0 bp. -- unassembled: 27967 sequences, total length 467802133 bp.
-- Contig sizes based on genome size -- -- NG (bp) LG (contigs) sum (bp)
-- 10 168250 2 364776 -- 20 159014 3 523790 -- 30 148469 5 826603 -- 40 141945 7 1116021 -- 50 141613 8 1257634 -- 60 138364 10 1536890 -- 70 129790 12 1802246 -- 80 122975 14 2050864 -- 90 119871 16 2293387 -- 100 117421 18 2529906 -- 110 114422 20 2758949 -- 120 108874 23 3092649 -- 130 106612 25 3306802 -- 140 104252 27 3516981 -- 150 96660 30 3817342 -- 160 95103 32 4008943 -- 170 94597 35 4292877 -- 180 92707 38 4572494 -- 190 90405 40 4754955 -- 200 89227 43 5023285 -- 210 87188 46 5285676 -- 220 85548 49 5543836 -- 230 81605 52 5792880 -- 240 79794 55 6034323 -- 250 78005 58 6271317 -- 260 76956 61 6503085 -- 270 76390 64 6733181 -- 280 74465 68 7035950 -- 290 72103 71 7254907 -- 300 70339 75 7539006 -- 310 68860 78 7746148 -- 320 68174 82 8019873 -- 330 66658 85 8220335 -- 340 65629 89 8484218 -- 350 63065 93 8739663 -- 360 62757 97 8991038 -- 370 61424 101 9238412 -- 380 59421 105 9477853 -- 390 58960 109 9714284 -- 400 57806 114 10004773 -- 410 56523 118 10231870 -- 420 55112 123 10510643 -- 430 54702 127 10729884 -- 440 53711 132 10999058 -- 450 53146 136 11212430 -- 460 52797 141 11477159 -- 470 52017 146 11738469 -- 480 51786 151 11998103 -- 490 51289 155 12203868 -- 500 50374 160 12458204 -- 510 48871 165 12706892 -- 520 47939 170 12948401 -- 530 46774 176 13232492 -- 540 46291 181 13464642 -- 550 45400 187 13738715 -- 560 44629 192 13963332 -- 570 44051 198 14229562 -- 580 43441 203 14448302 -- 590 42348 209 14704435 -- 600 41952 215 14956968 -- 610 40730 221 15204065 -- 620 40214 227 15446638 -- 630 39983 233 15687168 -- 640 39468 240 15965159 -- 650 39209 246 16201079 -- 660 38447 253 16471122 -- 670 37697 259 16698522 -- 680 36530 266 16956893 -- 690 35997 273 17210759 -- 700 35376 280 17460816 -- 710 34717 287 17705125 -- 720 34249 294 17946699 -- 730 33785 301 18184038 -- 740 32839 309 18450707 -- 750 32456 316 18679070 -- 760 31990 324 18937486 -- 770 31151 332 19189970 -- 780 30739 340 19437541 -- 790 30320 348 19681742 -- 800 29779 356 19922876 -- 810 29507 365 20189728 -- 820 29310 373 20424711 -- 830 28716 382 20685837 -- 840 28212 391 20942015 -- 850 27822 399 21166462 -- 860 27272 408 21414867 -- 870 27069 418 21686349 -- 880 26450 427 21926218 -- 890 26217 436 22163109 -- 900 25775 446 22422589 -- 910 25275 456 22678502 -- 920 24957 466 22930256 -- 930 24377 476 23176038 -- 940 23806 486 23416931 -- 950 22837 497 23673753 -- 960 22101 508 23920590 -- 970 21792 519 24162262 -- 980 21175 531 24419312 -- 990 20691 543 24670202 -- 000 20225 555 24915702 -- 010 19860 567 25155368 -- 020 19691 580 25412156 -- 030 19379 593 25665995 -- 040 18825 606 25913517 -- 050 18436 619 26155142 -- 060 18192 633 26411142 -- 070 17708 646 26644021 -- 080 17307 661 26905976 -- 090 16961 675 27145248 -- 100 16539 690 27397136 -- 110 16150 705 27642026 -- 120 15628 721 27896549 -- 130 14768 737 28141500 -- 140 10555 757 28391723