opencobra / memote

memote – the genome-scale metabolic model test suite
https://memote.readthedocs.io/
Apache License 2.0
126 stars 26 forks source link

Check for duplicate reactions #240

Closed ChristianLieven closed 6 years ago

ChristianLieven commented 6 years ago

Find identical reactions in the same compartments but with different IDs.

kcorreia commented 6 years ago

Is this a common occurrence? A reaction that has a different ID would likely end up as a blocked reaction. I find this issue more relevant towards comparing different models.

Another related issue is that some models have a reaction specified in more than one way:

cit <=> icit

cit <=> acon acon <=> icit

FBA results in these kinds of reactions have +/-1000 flux even though they don't have an impact on energetics (from my experience).

ChristianLieven commented 6 years ago

Leaving this here for later reference! https://github.com/SBRG/bigg_models/issues/280

ChristianLieven commented 6 years ago

@ChristianLieven: Sure, no problem. As this was just a very quick check, the code is not generic enough i.e. not of sufficient quality for a PR and I would also not know where and how exactly this should be incorporated, so I just leave it here; feel free to use/adapt whatever you consider as useful (I never gave it a try but maybe one could use DD-DeCaF's id-mapper to do this). I guess something similar should be done for metabolites, too (see #274) . Please also keep in mind that this only works for annotated entries; for example for http://bigg.ucsd.edu/universal/metabolites/C02712 and http://bigg.ucsd.edu/universal/metabolites/acmet one could not do it as the latter does not have any annotation; not sure what the best way is to filter those (same could apply to reactions).


import re

"""
This file determines potential duplicate reactions in bigg. Reactions are considered as duplicates if:
i) they have the same MetanetX ID
ii) they show the same compartment information (which is retrieved from the metabolite IDs)
"""

def retrieve_compartment_info(rea_string):

    """
    given a reaction string, it will return compartment information based on the metabolite IDs' suffix

    :param rea_string: reaction string
    :return: compartment string

    Example:
    retrieve_compartment_info('atp_c + coa_c + ppa_c <-> adp_c + pi_c + ppcoa_c') returns 'c'
    retrieve_compartment_info('acgam_e <-> acgam_p') returns 'ep'
    """

    # extract all compounds from the reaction string
    all_compounds = re.findall(r'\w+', rea_string)

    # get sorted set of compartments
    all_compartments = sorted(list(set(mi[-1] for mi in all_compounds if len(mi.split('_')) > 1)))

    return ''.join(all_compartments)

# reaction information from bigg; downloaded from http://bigg.ucsd.edu/data_access
rea_bigg = pd.read_csv('bigg_models_reactions.txt', sep='\t')

# get the metanetx ID for a given bigg reaction
metnetx_pat = 'MetaNetX \(MNX\) Equation\: http\:\/\/identifiers\.org\/metanetx\.reaction\/(MNXR\d+)'
rea_bigg['metanetx_id'] = rea_bigg.loc[:, 'database_links'].str.extract(metnetx_pat, expand=False)

# for now, we are only interested in reactions with a metanetx ID
rea_bigg = rea_bigg.dropna(subset=['metanetx_id'])

# get the compartment info for a given bigg reaction
rea_bigg['comp_info_bigg'] = rea_bigg['reaction_string'].apply(lambda x: retrieve_compartment_info(x))

# for now we only check whether metanetx id and compartment are identical
rea_bigg = rea_bigg[['metanetx_id', 'comp_info_bigg', 'bigg_id']]

# group by metanetx id and compartment and identify groups with more than one member
dupls = rea_bigg.groupby(['metanetx_id', 'comp_info_bigg'])['metanetx_id'].transform('size') > 1

# select all potential duplicate reactions
rea_bigg_identical = rea_bigg[dupls].sort_values(['metanetx_id', 'comp_info_bigg', 'bigg_id']).reset_index(drop=True)

rea_bigg_identical.drop('comp_info_bigg', axis=1).to_csv('potential_identical_reactions_bigg.csv', index=False)```
ChristianLieven commented 6 years ago

I think this could be tackled as similarily to the duplicate metabolites just like you've said @siddC.

If you'd like to continue with this PR then by all means go for it!