micom-dev / micom

Python package to study microbial communities using metabolic modeling.
https://micom-dev.github.io/micom
Apache License 2.0
94 stars 18 forks source link

[MICOM 1.0 API] Proposed new format for fluxes #34

Open cdiener opened 3 years ago

cdiener commented 3 years ago

This is a proposal for a new format for fluxes slated for MICOM 1.0. Feel free to comment :smile:

Checklist

Current state

The current format for fluxes returned by MICOM is a table in wide format:

In [1]: from micom import Community

In [2]: from micom.data import test_taxonomy

In [3]: com = Community(test_taxonomy())
Building ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

In [7]: sol = com.cooperative_tradeoff(fluxes=True)

In [8]: sol.fluxes
Out[8]: 
reaction               ACALD    ACALDt      ACKr    ACONTa    ACONTb     ACt2r          ADK1  ...     SUCDi    SUCOAS      TALA          THD2      TKT1      TKT2       TPI
compartment                                                                                   ...                                                                          
Escherichia_coli_1  0.049190 -0.008897 -0.004224  5.999485  5.999485 -0.004224  3.388665e-11  ...  5.017641 -5.017641  1.489184  1.924736e-10  1.489184  1.173698  7.513137
Escherichia_coli_2 -0.079989 -0.115231  0.072559  6.001066  6.001066  0.072559  4.264225e-11  ...  5.033051 -5.033051  1.491048  1.924125e-10  1.491048  1.175562  7.495742
Escherichia_coli_3  0.102350  0.197394 -0.100513  6.004985  6.004985 -0.100513  3.662292e-11  ...  5.083935 -5.083935  1.506075  1.926208e-10  1.506075  1.190589  7.460396
Escherichia_coli_4 -0.071551 -0.073266  0.032177  6.023463  6.023463  0.032177  4.133342e-11  ...  5.122875 -5.122875  1.501628  1.926284e-10  1.501628  1.186143  7.440253
medium                   NaN       NaN       NaN       NaN       NaN       NaN           NaN  ...       NaN       NaN       NaN           NaN       NaN       NaN       NaN

[5 rows x 115 columns]

This has resulted in some issues:

  1. It is incompatible with cobra.Solution.fluxes which breaks a lot of the cobra functionality like for instance summary methods.
  2. It can be pretty sparse for very divergent models (many NA entries)
  3. It mixes medium and taxa fluxes
  4. It does not specify if export fluxes denote import or export which is one of the most common help requests we receive
  5. Basically all methods using flux results in MICOM will convert them to a long format

Proposed new API for fluxes

CommunitySolution.fluxes will retain the cobrapy format and will superseded by new accessors that all return fluxes in long format:

CommunitySolution.exchange_fluxes

Similar to the previous one but with the taxa annotated.

      reaction                     name               taxon          flux direction                       micom_id
0      EX_ac_m     ac_m medium exchange              medium  1.814984e-11    export                        EX_ac_m
1   EX_acald_m  acald_m medium exchange              medium  1.328645e-11    export                     EX_acald_m
2     EX_akg_m    akg_m medium exchange              medium  3.225128e-12    export                       EX_akg_m
3     EX_co2_m    co2_m medium exchange              medium  2.280983e+01    export                       EX_co2_m
4    EX_etoh_m   etoh_m medium exchange              medium  1.515389e-11    export                      EX_etoh_m
..         ...                      ...                 ...           ...       ...                           

CommunitySolution.internal_fluxes

    reaction                                               name               taxon          flux                    micom_id
0      ACALD           Acetaldehyde dehydrogenase (acetylating)  Escherichia_coli_1  1.312146e+00   ACALD__Escherichia_coli_1
1     ACALDt                  Acetaldehyde reversible transport  Escherichia_coli_1  3.236132e+00  ACALDt__Escherichia_coli_1
2       ACKr                                     Acetate kinase  Escherichia_coli_1 -1.304078e+00    ACKr__Escherichia_coli_1
3     ACONTa   Aconitase (half-reaction A, Citrate hydro-lyase)  Escherichia_coli_1  5.987675e+00  ACONTa__Escherichia_coli_1
4     ACONTb  Aconitase (half-reaction B, Isocitrate hydro-l...  Escherichia_coli_1  5.987675e+00  ACONTb__Escherichia_coli_1

This will consolidate GrowthResults and CommunitySolution and gives a more readable format. All those properties are generated on the fly when accessing the property.

Additionaly, we may also want to save the annotations in the solution but they may be large, so it might be better to have a property on the model class like Community.annotations.

Additional context

A similar format change is planned for Community.knockout_taxa. elasticities already uses a long format.

kaisir97 commented 5 months ago

Sorry for perhaps the outdated question. I am a graduate student working with MES scores from Marcelino et al., Nature Communications 2023 I am currently implementing MES framework for my metgenome data and confused about the res.exchanges['flux'] from the micom.workflow.grow function. If I want to get the total production flux or consumption flux of a certain metabolite, should I weight each flux of each metabolite with relative abundance of each species? or are the flux of each metabolite already weighted by each species abudance? I looked at CD_focus/MetModels_summarize_total_produc_consump.py and R_scripts_4_figs /sulfur_stats_He2017.R and it seemed to me that it was the former case but I just wanted to be sure.

cdiener commented 5 months ago

Yes, exactly it would be scaled by relative abundance, though you can also use the production_rates function that does that for you. Also note that MES scores are part of MICOM since version 0.35.0. See the MES function and new visualizations as well.