Open matthiaskoenig opened 5 years ago
While I tend to agree with you and I'm open to changing how the score is computed, both biomass and exchange identifiers can at least be found in the BiGG database.
In a similar manner, I've only just realized that this is also the case for the metabolic coverage calculation. It will be skewed in favor of models with lots of pseudo-reactions. I've opened a separate issue to track this here:
Currently, the memote report expects reaction annotations to biological databases on the exchange reactions and biomass reaction. Exchange reactions are purely mathematical constructs to enable a FBA simulation, but don't have any biological counterpart. The exchange reactions should not be counted in the reaction annotation statistics, because they are a modeling construct. Similar for the biomass reaction, which is a modeling construct (with some biological resemblence).
Counting the exchange reactions skrews the statistics and especially for small models (with more exchange reactions per internal reactions brings down the statistics).
E.g. in e_coli_core the metanetx.reaction annotation statistics is reported as 77.9% with missing annotations being
["BIOMASS_Ecoli_core_w_GAM","EX_ac_e","EX_acald_e","EX_akg_e","EX_co2_e","EX_etoh_e","EX_for_e","EX_fru_e","EX_fum_e","EX_glcD_e","EX_glnL_e","EX_gluL_e","EX_h_e","EX_h2o_e","EX_lacD_e","EX_mal__L_e","EX_nh4_e","EX_o2_e","EX_pi_e","EX_pyr_e","EX_succ_e"]
I.e. all reactions which can be annotated are annotated, but still the statistics misses 22.1% because it expects annotations on things which are not biological, and consequently don't exist in any databases.