resendislab / corda

An implementation of genome-scale model reconstruction using Cost Optimization Reaction Dependency Assessment by Schultz et. al
http://resendislab.github.io/corda
MIT License
23 stars 8 forks source link

Gene Protein Reactions Rules are not consistent in the outputed model #13

Open migp11 opened 6 years ago

migp11 commented 6 years ago

Hi again,

I am creating some context-specific models (CSM) and then doing some in-silico single gene deletion experiments to find context-specific vulnerabilities. When I create a CSM I follow these steps: 1) Transform gene confidences into reactions confidences: reactions_confidences = create_reactions_confidences(model, gene_confidences) 2) Run CORDA corda = CORDA(model, reactions_confidences) corda.build() 3) Finally, create the CSM cs_model = corda.cobra_model()

The problem I've found is that the reactions included in the CSM inherit the Gene-Reaction-Protein (GPR) of the original model and thus include genes with low confidences that should not be in the model. Then, when the single gene KO experiments are conducted, some genes that should be predicted as essential in the CSM, are not. I've found that this happens because the model include other genes that code for the same reactions. For instance, the GPR of the enolase (ENO) is (HGNC:3350 or HGNC:3354 or HGNC:3353) and my gene confidences are {"HGNC:3350": 3, "HGNC:3353": -1, "HGNC:3354": -1}, then ENO should be present in the model but I guess the GPR should only include genes with high confidence, and some genes with low confidences only if they are required to enable flux through ENO. In the above example the ENO GPR in the CSM should be (HGNC:3350). In this way removing HGNC:3350 will imply a zero flux through ENO. What do you think?

I have a python-based implementation of Fastcore (I will put it in github in a near future) and I have the same problem. Thus, I trying to find a way to update the GPR after creating the CSM, so if I find some elegant way to solve this issue, I will share the idea with you. I would like to hear your opinion on this issue.

Best regards, Miguel

cdiener commented 6 years ago

This is a very good point. How would you handle reaction that were only included to maintain a particular metabolic requirement or to enable other high confidence reactions? Those would probably only have genes with confidences between -1 to 2. I could definitely see that as something that could be enabled with an option. For instance using: corda.cobra_model(strip_genes = True) or something like that. You are very much invited to take a shot at that if you want to. Otherwise I could implement something along that line but it would probably take a while...

migp11 commented 5 years ago

Hi again, and sorry for this long delay in my answer. After pointing out the problem of the inconsistency of the GPR treatment I found that this was a general problem present in many other model extraction methods (etc, GIMME, FASTCORE, INIT). So my collaborators and I start to work on this issue. We wrote a small letter that, hopefully, will be published soon since we got good feedback from the reviewers. If you want to have a look to it, there is a first draft is at biorxiv:

https://www.biorxiv.org/content/10.1101/593277v1.abstract

The solution we proposed there is to somehow store the context used to build the model. Then, when one wants to asses the effect of a knockout the idea is to recalculate reactions confidence but first setting the confidence of the knocked gene to -1 (in the case of corda). This will output the set of reactions that become inactive under given the knockout and under the context used to build the model. As you commented, another approach would be to strip genes from the GPR as a final step in the construction of the CS-model. However, I would prefer to have the genes' expression or confidences stored in the CS-model because that would make the model more "transparent".

I think that the current version of cobrapy does not support storing genes' attributes. Nonetheless, the current libSBML FBC extension allows to link genes to species and thus the gene expression could be stored as a standard species attribute such initialConentration or initialAmount. I will bring this discussion into de cobrapy community. What do you think?

Please, let me know your opinion about all these thins Best Miguel

cdiener commented 5 years ago

There is a way of flagging a gene as non-functional in cobrapy. We are in the process of extending the annotations on objects as well and I agree that genes should have its own set of annotations. The challenge would be to find a valid MIRIAM annotation for that, otherwise you can't write them to SBML. Also groups will come to cobrapy soon so you will be able to define sets of reactions or genes that belong to a particular condition.

migp11 commented 5 years ago

I guess the flag you mentioned should be the gene attribute "gene.functional". I've been doing some experiments using this flag, but I did not obtain the expected results. My guess was that flagging genes as non-functional (i.e. gene.functional = False) would have some effect when single_gene_deletion were run. However, I did not find such effect, in fact the single_gene_deletion does no seams to use this gene attribute at all.. I'm probably misunderstanding something. Can you better explain how the gene flagging is used?

Regarding the possibility of storing gene annotations, I've been exchanging ideas in the fbc-sbml mailing list and some people point that in a near by future thesbml fbc extension will support the storage of user custom “KeyValuePair” annotation (see section 4.6 and 4.7 https://sourceforge.net/p/sbml/code/HEAD/tree/trunk/specifications/sbml-level-3/version-1/fbc/spec/harmony2018-proposal-fbc-v3.pdf?format=raw), This feature could be use to store molecular context such as gene expression. What do you think?

cdiener commented 5 years ago

Yes that would definitely work if FBC supports it. You are right about the functional flag. That is a bit weird behavior. I will open an issue.