kbaseattic / assembly

An extensible framework for genome assembly.
MIT License
12 stars 14 forks source link

improve arast_score #293

Open levinas opened 9 years ago

levinas commented 9 years ago

to consider predicted genes and include penalty on N's.

Here's an example of megahit output compared to velvet k sweep 29-37:4.

Assembly                        velvet_contigs  velvet_contigs 1  velvet_contigs 2  velvet_contigs 3  megahit_contigs  velvet_contigs 4  velvet_contigs 5
# contigs (>= 0 bp)             4728            6975              6918              7707              5599             6182              5363
# contigs (>= 1000 bp)          595             719               700               516               368              229               111
Total length (>= 0 bp)          3088612         2824652           2614542           2201267           2659076          1505354           1145321
Total length (>= 1000 bp)       2465567         1797151           1505078           869480            521505           336212            148188
# contigs                       747             1032              1060              1012              1745             657               427
Largest contig                  27010           20206             10620             7043              7608             4999              2714
Total length                    2576302         2030247           1774823           1235908           1449883          643186            375876
GC (%)                          56.02           55.53             55.01             54.39             55.62            52.99             52.18
N50                             5241            2570              2048              1354              832              1041              894
N75                             2999            1558              1257              926               635              744               707
L50                             148             248               272               302               594              215               154
L75                             306             503               546               576               1097             400               273
# N's per 100 kbp               14446.21        18417.56          22478.35          24798.12          0.00             29970.96          31361.14
# predicted genes (unique)      5510            4818              4342              3188              2859             1655              955
# predicted genes (>= 0 bp)     5510            4818              4342              3188              2859             1655              955
# predicted genes (>= 300 bp)   2109            1346              903               470               1758             144               74
# predicted genes (>= 1500 bp)  2               0                 0                 0                 11               0                 0
# predicted genes (>= 3000 bp)  0               0                 0                 0                 0                0                 0