ndreey / CONURA_WGS

Metagenomic analysis on whole genome sequencing data from Tephritis conura (IN PROGRESS)
0 stars 0 forks source link

Metagenome Assembly #36

Open ndreey opened 1 month ago

ndreey commented 1 month ago

Assembly comparison

CHST

We see that the hybrid and sr assembly of SPAdes produces similar number of contigs below 5000 bp. Then the hybrid assembly outperforms the short read assemblies. Both megahit and SPAdes sr assembly become quite similar with megahit generating longer contigs above 25kbp.

quast.py -t 4 -l hybrid,sr,megahit --k-mer-stats --no-icarus -o 01-QC/quast-assembly/CHST 06-ASSEMBLY/CHST/contigs.fasta 06-ASSEMBLY/CHST-sr/contigs.fasta 06-ASSEMBLY/CHST-mega/CHST_mega.contigs.fa

image

CHST: metaSPAdes with long reads

A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.2902  0.2064  0.2061  0.2972  0.0000  0.0000  0.0000  0.4126  0.0874

Main genome scaffold total:             31008
Main genome contig total:               31008
Main genome scaffold sequence total:    18.379 MB
Main genome contig sequence total:      18.379 MB       0.000% gap
Main genome scaffold N/L50:             4415/696
Main genome contig N/L50:               4415/696
Main genome scaffold N/L90:             22171/254
Main genome contig N/L90:               22171/254
Max scaffold length:                    770.501 KB
Max contig length:                      770.501 KB
Number of scaffolds > 50 KB:            8
% main genome in scaffolds > 50 KB:     8.40%

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig  
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
    All                 31,008          31,008      18,378,888      18,378,888   100.00%
     50                 31,008          31,008      18,378,888      18,378,888   100.00%
    100                 30,485          30,485      18,340,657      18,340,657   100.00%
    250                 22,673          22,673      16,685,113      16,685,113   100.00%
    500                  9,145           9,145      11,956,562      11,956,562   100.00%
   1 KB                  1,750           1,750       7,018,347       7,018,347   100.00%
 2.5 KB                    345             345       5,172,028       5,172,028   100.00%
   5 KB                    218             218       4,717,069       4,717,069   100.00%
  10 KB                    137             137       4,126,218       4,126,218   100.00%
  25 KB                     34              34       2,459,420       2,459,420   100.00%
  50 KB                      8               8       1,544,395       1,544,395   100.00%
 100 KB                      4               4       1,269,603       1,269,603   100.00%
 250 KB                      1               1         770,501         770,501   100.00%
 500 KB                      1               1         770,501         770,501   100.00%

CHST: metaSPAdes with short reads only

A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.2972  0.1993  0.1990  0.3045  0.0000  0.0000  0.0000  0.3983  0.0884

Main genome scaffold total:             31933
Main genome contig total:               31933
Main genome scaffold sequence total:    16.759 MB
Main genome contig sequence total:      16.759 MB       0.000% gap
Main genome scaffold N/L50:             6165/613
Main genome contig N/L50:               6165/613
Main genome scaffold N/L90:             23598/248
Main genome contig N/L90:               23598/248
Max scaffold length:                    88.989 KB
Max contig length:                      88.989 KB
Number of scaffolds > 50 KB:            4
% main genome in scaffolds > 50 KB:     1.85%

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig  
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
    All                 31,933          31,933      16,758,681      16,758,681   100.00%
     50                 31,933          31,933      16,758,681      16,758,681   100.00%
    100                 31,288          31,288      16,711,394      16,711,394   100.00%
    250                 23,360          23,360      15,039,602      15,039,602   100.00%
    500                  9,420           9,420      10,175,761      10,175,761   100.00%
   1 KB                  1,854           1,854       5,117,993       5,117,993   100.00%
 2.5 KB                    336             336       3,092,405       3,092,405   100.00%
   5 KB                    159             159       2,466,956       2,466,956   100.00%
  10 KB                     83              83       1,926,939       1,926,939   100.00%
  25 KB                     24              24       1,067,294       1,067,294   100.00%
  50 KB                      4               4         309,524         309,524   100.00%

CHST: megahit

A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.3087  0.1959  0.1959  0.2995  0.0000  0.0000  0.0000  0.3918  0.0764

Main genome scaffold total:             23118
Main genome contig total:               23118
Main genome scaffold sequence total:    13.174 MB
Main genome contig sequence total:      13.174 MB       0.000% gap
Main genome scaffold N/L50:             4560/609
Main genome contig N/L50:               4560/609
Main genome scaffold N/L90:             17494/268
Main genome contig N/L90:               17494/268
Max scaffold length:                    101.22 KB
Max contig length:                      101.22 KB
Number of scaffolds > 50 KB:            9
% main genome in scaffolds > 50 KB:     5.12%

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig  
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
    All                 23,118          23,118      13,173,747      13,173,747   100.00%
    100                 23,118          23,118      13,173,747      13,173,747   100.00%
    250                 18,809          18,809      12,209,917      12,209,917   100.00%
    500                  7,257           7,257       8,071,980       8,071,980   100.00%
   1 KB                  1,275           1,275       4,142,993       4,142,993   100.00%
 2.5 KB                    298             298       2,826,644       2,826,644   100.00%
   5 KB                    138             138       2,277,852       2,277,852   100.00%
  10 KB                     64              64       1,768,848       1,768,848   100.00%
  25 KB                     22              22       1,139,203       1,139,203   100.00%
  50 KB                      9               9         674,794         674,794   100.00%
 100 KB                      1               1         101,220         101,220   100.00%

COGE

quast.py -t 4 -l sr,megahit --k-mer-stats --no-icarus -o 01-QC/quast-assembly/COGE 06-ASSEMBLY/COGE/contigs.fasta 06-ASSEMBLY/COGE-mega/COGE_mega.contigs.fa

image

COGE: metaSPAdes

A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.3262  0.1700  0.1703  0.3335  0.0000  0.0000  0.0000  0.3403  0.1259

Main genome scaffold total:             151145
Main genome contig total:               151145
Main genome scaffold sequence total:    55.277 MB
Main genome contig sequence total:      55.277 MB       0.000% gap
Main genome scaffold N/L50:             42151/373
Main genome contig N/L50:               42151/373
Main genome scaffold N/L90:             122675/226
Main genome contig N/L90:               122675/226
Max scaffold length:                    126.625 KB
Max contig length:                      126.625 KB
Number of scaffolds > 50 KB:            9
% main genome in scaffolds > 50 KB:     1.45%

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig  
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
    All                151,145         151,145      55,277,497      55,277,497   100.00%
     50                151,145         151,145      55,277,497      55,277,497   100.00%
    100                147,413         147,413      55,017,666      55,017,666   100.00%
    250                 97,779          97,779      44,010,411      44,010,411   100.00%
    500                 21,340          21,340      18,722,472      18,722,472   100.00%
   1 KB                  3,522           3,522       7,180,868       7,180,868   100.00%
 2.5 KB                    332             332       2,815,915       2,815,915   100.00%
   5 KB                    115             115       2,111,903       2,111,903   100.00%
  10 KB                     55              55       1,707,072       1,707,072   100.00%
  25 KB                     20              20       1,154,510       1,154,510   100.00%
  50 KB                      9               9         804,038         804,038   100.00%
 100 KB                      3               3         329,122         329,122   100.00%

COGE: megahit

A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.3215  0.1831  0.1831  0.3124  0.0000  0.0000  0.0000  0.3661  0.1191

Main genome scaffold total:             62751
Main genome contig total:               62751
Main genome scaffold sequence total:    27.509 MB
Main genome contig sequence total:      27.509 MB       0.000% gap
Main genome scaffold N/L50:             16092/460
Main genome contig N/L50:               16092/460
Main genome scaffold N/L90:             50170/239
Main genome contig N/L90:               50170/239
Max scaffold length:                    205.007 KB
Max contig length:                      205.007 KB
Number of scaffolds > 50 KB:            8
% main genome in scaffolds > 50 KB:     2.76%

Minimum         Number          Number          Total           Total           Scaffold
Scaffold        of              of              Scaffold        Contig          Contig  
Length          Scaffolds       Contigs         Length          Length          Coverage
--------        --------------  --------------  --------------  --------------  --------
    All                 62,751          62,751      27,509,028      27,509,028   100.00%
    100                 62,751          62,751      27,509,028      27,509,028   100.00%
    250                 47,041          47,041      23,997,177      23,997,177   100.00%
    500                 13,469          13,469      12,501,523      12,501,523   100.00%
   1 KB                  2,322           2,322       5,167,356       5,167,356   100.00%
 2.5 KB                    235             235       2,341,976       2,341,976   100.00%
   5 KB                     83              83       1,849,450       1,849,450   100.00%
  10 KB                     46              46       1,598,991       1,598,991   100.00%
  25 KB                     22              22       1,199,801       1,199,801   100.00%
  50 KB                      8               8         759,929         759,929   100.00%
 100 KB                      3               3         412,033         412,033   100.00%