voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
588 stars 134 forks source link

Low fraction of assembly represented by contigs >= 1000bp #254

Closed franciscozorrilla closed 4 years ago

franciscozorrilla commented 4 years ago

Hi, I assembled some soil samples using the meta-large preset: '--k-min 27 --k-max 127 --k-step 10', here are some overall statistics:

SAMPLE N_CONTIGS TOTAL_LEN AVE_LEN MEDIAN_LEN N_CONTIGS>1000bp LEN_CONTIGS>1000bp
ERR671910 1989681 1109354376 557.554 409 158547 277710564
ERR671911 1808158 1015222211 561.468 409 147739 261873801
ERR671912 1688610 917313823 543.236 406 123950 210721932
ERR671913 1527877 833822673 545.739 406 114905 196368559
ERR671914 1662821 872259749 524.566 402 103781 171892678
ERR671915 1500905 790508628 526.688 402 96052 159677157
ERR671916 1714129 874189185 509.99 396 92077 154491161
ERR671917 1533968 788390172 513.955 396 86116 146034021
ERR671918 1706856 928299380 543.865 401 124812 220421885
ERR671919 1715440 933131187 543.96 401 125357 221861628
ERR671920 1604389 795299672 495.703 395 75087 119860778
ERR671921 1613302 799605963 495.633 395 75163 120079636
ERR671922 1888982 952134882 504.047 382 108280 194214723
ERR671923 1917427 967649193 504.66 382 110170 198131269
ERR671924 1498351 760384865 507.481 391 77850 135544219
ERR671925 1506281 764765210 507.717 391 78940 137279305
ERR671926 1059982 468670921 442.15 374 23787 33556451
ERR671927 1069709 472714903 441.91 374 23696 33388754
ERR671928 1204740 508618924 422.181 363 22653 32224416
ERR671930 1604808 821764941 512.064 398 90668 144683277
ERR671931 1696975 875711543 516.043 400 99592 159627741
ERR671932 1564881 769097229 491.473 392 67782 111119756
ERR671933 993928 525580223 528.791 406 59686 101192722
ERR671934 1845946 904756932 490.132 383 95320 157433834
ERR671935 1867734 917327454 491.145 383 97006 160986321
ERR671936 1726338 865334409 501.254 395 85634 137344017
ERR671937 1817841 915790402 503.779 396 92643 148819568
ERR671938 1184784 530063846 447.393 380 25706 35890029
ERR671939 1247541 560092925 448.958 381 27772 39026047
ERR687883 1331566 600343552 450.855 381 31285 44109946
ERR687884 1337806 604168405 451.611 381 31595 44706017
ERR687885 1642116 811016336 493.885 392 73474 121516895
ERR687886 1653892 817113117 494.055 391 74189 122881172
ERR687887 1723471 890899726 516.922 400 102713 164210054
ERR687888 1734035 896784936 517.167 400 103302 165280019
ERR687889 1450818 713768546 491.977 393 65200 103790451
ERR687890 1453784 714729905 491.634 393 65199 103599850
ERR687891 1577004 852374398 540.502 399 112534 199717871
ERR687892 1588336 858694367 540.625 399 113502 201261665
ERR687893 1589153 805987126 507.18 396 83492 139185095
ERR687894 1593919 808561260 507.279 396 83894 139772052
ERR687895 1572261 819958461 521.515 401 95991 158009997
ERR687896 1574142 820895815 521.488 401 95954 158442860
ERR687897 2026096 1132033966 558.727 412 161374 282106768
ERR687898 2030079 1132610824 557.915 160406 280786994

Since my goal is to generate species specific bins, I am a bit concerned with the fraction of the total assembly that is represented by contigs >= 1000 bp (between 7-25% of assembly) since only this information is used by the binning algorithms.

Do these assemblies seem normal to you? Is there any strategy you would recommend in order to increase the fraction of contigs >= 1000 bp for my assemblies?

Best, FZ

voutcn commented 4 years ago

Closing this issue and please discuss in thread #259