snayfach / IGGsearch

Metagenomic species profiling with enhanced coverage of the human gut microbiome
GNU General Public License v3.0
24 stars 8 forks source link

Comparison to MIDAS #27

Open durrantmm opened 5 years ago

durrantmm commented 5 years ago

I decided to compare the output of IGGsearch with the output of MIDAS using the default database.

I have noticed that the two approaches return quite different results.

Here are the top results for IGGsearch:

species_id  species_name                      marker_length  marker_count  percent_markers_detected  total_mapped_reads  avg_read_depth          species_abund           species_presence
OTU-12111   Lachnospiraceae sp.               121890         117           100.0                     138248              104.2779063089671       10.533942669987223      1
OTU-14137   Dialister invisus                 163377         179           100.0                     173032              97.44565024452646       9.843762014880406       1
OTU-04682   Bacteroides caccae                4266           5             100.0                     3194                71.7487107360525        7.247909286744867       1
OTU-11960   Agathobacter sp.                  2001           3             100.0                     1194                58.47426286856572       5.906951477354946       1
OTU-05577   Alistipes putredinis              143196         138           100.0                     88514               54.772640297215005      5.533021070303937       1
OTU-13127   ER4 sp.                           102381         109           100.0                     51748               41.299879860520996      4.17203012725503        1
OTU-12731   DTU089 HGM12731                   148416         159           100.0                     62457               37.43034443725744       3.7811375043501347      1
OTU-13412   Ruminococcus bicirculans          95007          99            100.0                     35769               36.04483880135148       3.641176534148698       1
OTU-05576   Alistipes finegoldii              33564          28            100.0                     11685               29.58535931355023       2.9886530130027276      1
OTU-04719   Bacteroides HGM04719              26529          47            6.382978723404255         7196                25.67778657318406       2.5939179374454766      0
OTU-04685   Bacteroides cellulosilyticus      1146           2             100.0                     285                 24.345549738219894      2.45933807351172        1
OTU-12936   Ruminococcus bromii               9117           15            100.0                     2082                22.277393879565647      2.250417161893948       1
OTU-12039   Blautia sp.                       63324          71            100.0                     14848               22.040616511907018      2.226498392283477       1
OTU-13276   Faecalibacterium HGM13276         222            1             100.0                     41                  17.45945945945946       1.7637191952241624      1
OTU-13102   Oscillospiraceae sp.              64380          59            100.0                     10968               13.943988816402609      1.408593478542991       1
OTU-13434   Gemmiger formicilis               17439          18            100.0                     2652                12.379150180629622      1.250516652295894       1
OTU-07930   Gastranaerophilaceae sp.          258405         259           100.0                     28100               10.699843269286585      1.0808764729372133      1
OTU-12267   Dorea HGM12267                    43965          52            38.46153846153847         4236                9.338428295234847       0.9433490925520364      1
OTU-12236   Coprococcus eutactus              295593         280           98.92857142857143         24386               7.832042707371285       0.7911770746901357      1
OTU-05580   Alistipes shahii                  3951           6             100.0                     349                 7.63426980511263        0.7711984570921094      1
OTU-13156   Oscillibacter sp.                 29532          29            100.0                     2362                6.685629148042801       0.6753687013010377      1
OTU-04739   Bacteroides HGM04739              30699          54            9.25925925925926          2060                6.543275025245122       0.6609883764415796      0
OTU-12036   Blautia sp.                       78939          83            100.0                     4980                6.040360278189488       0.6101849483627823      1
OTU-12290   Eubacterium sp.                   195324         202           100.0                     11179               5.498131310028466       0.5554100773782144      1
OTU-11574   Christensenellales HGM11574       87681          83            100.0                     5455                5.456233391498729       0.5511776345971294      1
OTU-12374   KLE1615 sp.                       194778         138           100.0                     10718               5.336172463009169       0.5390493230268222      1
OTU-12529   Ruminococcus HGM12529             15996          8             50.0                      891                 5.325143785946486       0.5379352284307926      1
OTU-04283   Collinsella HGM04283              24411          50            2.0                       1352                4.976158289295809       0.5026814211335594      0
OTU-05691   Parabacteroides HGM05691          27993          66            12.121212121212121        1425                4.835280248633587       0.488450207098642       0
OTU-12282   Eubacterium sp.                   40116          45            100.0                     1970                4.783079070694985       0.4831769499421186      1

And the top results for MIDAS:

species_id                                         count_reads  coverage               relative_abundance
Bacteroides_uniformis_57318                        23988        257.0999063012415      0.1946839801334291
Bacteroides_vulgatus_57955                         20528        196.32608695652175     0.14866416935964785
Bacteroides_ovatus_58035                           14940        139.7171686746988      0.10579815015060033
Bacteroides_massiliensis_44749                     11626        116.77742448330683     0.08842746819791954
Bacteroides_caccae_53434                           8368         78.650923402967        0.05955690544405749
Dialister_invisus_61905                            7562         69.83463151587777      0.052880937259979235
Eubacterium_rectale_56927                          6573         60.70756646216769      0.04596964204732738
Bacteroides_rodentium_59708                        6277         64.63794484494794      0.048945845804062686
Alistipes_putredinis_61533                         4499         48.865603644646924     0.037002542498073734
Ruminococcus_bicirculans_59300                     3818         35.90825141495383      0.027190835678112
Bacteroides_cellulosilyticus_58046                 3761         37.344130411770934     0.028278127548855864
Ruminococcus_bromii_62047                          2963         27.604179104477613     0.020902735958522548
Alistipes_onderdonkii_55464                        2934         24.976336816226183     0.018912852713523266
Bacteroides_xylanisolvens_57185                    2828         26.593508190131644     0.020137424764750965
Faecalibacterium_prausnitzii_57453                 1347         11.292214257268022     0.00855081298064937
Blautia_wexlerae_56130                             967          9.002355712603062      0.006816861452500707
Oscillospiraceae_bacterium_54867                   793          7.189020381328073      0.005443748001438043
Alistipes_shahii_62199                             737          6.479423218608567      0.004906419139996312
Coprococcus_sp_62244                               724          6.718218072171967      0.005087241969517808
Eubacterium_hallii_61477                           682          5.574659863945579      0.0042213044174795366
Ruminococcus_torques_62045                         636          5.932215028448106      0.004492056217991877
Alistipes_finegoldii_56071                         608          5.129551820728292      0.003884254876351309
Bacteroidales_bacterium_58650                      525          4.536858475894245      0.003435449192090622
Eubacterium_eligens_61678                          447          4.285558852621167      0.003245157365194698
Ruminococcus_lactaris_55568                        447          4.174266941257232      0.0031608836967492457
Alistipes_sp_59510                                 422          3.650489438433797      0.002764263213992102
Dorea_formicigenerans_56346                        411          3.8803165182987143     0.002938295368628984
Odoribacter_splanchnicus_62174                     398          3.78731982663038       0.0028678754049125477
Sutterella_wadsworthensis_62218                    394          3.020520094562648      0.002287230993361192
Alistipes_indistinctus_62207                       325          2.873587570621469      0.0021759691536217315
Bacteroides_acidifaciens_59693                     323          2.980702309395761      0.0022570797381238424
Roseburia_intestinalis_56239                       288          2.6321938140381116     0.0019931783545619136
Lachnospiraceae_bacterium_51870                    252          2.3518900343642613     0.0017809236857118634
Ruminococcus_obeum_62046                           238          2.190082644628099      0.001658398138728866
Bilophila_wadsworthia_57364                        226          1.8216597998822837     0.0013794169954873938

You can see that the most abundant microbe according to MIDAS is B. uniformis with 257x coverage. IGGsearch does not show B. uniformis anywhere in its top results.

What do you think may be explaining the difference between the two tools?

snayfach commented 5 years ago

Hi Matt,

I've made a few changes to the default parameters for IGGsearch, which should produce more accurate results out of the box. Please give it another try and let me know how the results look.

However, I think that a lot of differences are to be expected since the two methods use different strategies (universal genes in MIDAS vs species specific genes in IGGsearch), have different numbers of species (5900 total species in MIDAS versus 23790 species in IGGsearch), and used slightly different approaches for defining species (96.5% identity across marker genes for MIDAS versus 95% genome wide ANI over at least 20% of the genome for IGGsearch).

As for B. uniformis (OTU-04728), looking at the database file iggdb_v1.0.0_gut/iggdb_v1.0.0_gut.species I see that species has zero marker genes which explains why it was not reported by IGGsearch. Of the 4,558 of gut species in the IGGsearch database, 99% have at least 1 marker gene and 95% have at least 10. Unfortunately B. uniformis and several other common species are among the 1% with no marker genes.

I will look into why I was unable to identify marker genes for B. uniformis and try to add these genes to the database in the near future.

Thanks, Stephen