opencobra / cobratoolbox

The COnstraint-Based Reconstruction and Analysis Toolbox. Documentation:
https://opencobra.github.io/cobratoolbox
Other
251 stars 313 forks source link

GPR Rules with irrelevant genes #1386

Open tpfau opened 5 years ago

tpfau commented 5 years ago

I was trying to optimize the GPR parsing code using existing implementations for boolean formula/sat solving processors and found one oddity in some formulas: There are sometimes genes which have no effect whatsoever. An example can be found in the ecoli core model for the PFL reaction. This reaction has the following gpr rule: (((b0902 and b0903) and b2579) or (b0902 and b0903) or (b0902 and b3114) or (b3951 and b3952)) as can clearly be seen, gene b2579 cannot have any effect on the formula. If either b0902 or b0903 are not present, the first clause evaluates to false, as does the second. If both are present the first only evaluates to true if b2579 is present, but that does not matter, as the second clause evaluates to true and thus the whole formula evaluates to true. I'm not sure, why these rules exist, and computationally, they don't make any sense to me, particularily as any computation using the formula will, necessarily ignore the b2579 effect. The only difference the rule would have is in the gene association (which if b2579 is removed would no longer be there).

Now, I'm wondering, whether these kind of things should be 'corrected' during processing or not. One thing I can say is that if we use a more efficient external library for our models, those entries would be removed, so an update here depends on this. Any thoughts?

I hereby confirm that I have:

(Note: You may replace [ ] with [X] to check the box)

rmtfleming commented 5 years ago

Hi Thomas,

Fig 10. in the attached shows the GPR for PFL. Has it been mis-specified?

Regards,

Ronan

On Thu, 22 Nov 2018 at 09:28, Thomas Pfau notifications@github.com wrote:

I was trying to optimize the GPR parsing code using existing implementations for boolean formula/sat solving processors and found one oddity in some formulas: There are sometimes genes which have no effect whatsoever. An example can be found in the ecoli core model for the PFL reaction. This reaction has the following gpr rule: (((b0902 and b0903) and b2579) or (b0902 and b0903) or (b0902 and b3114) or (b3951 and b3952)) as can clearly be seen, gene b2579 cannot have any effect on the formula. If either b0902 or b0903 are not present, the first clause evaluates to false, as does the second. If both are present the first only evaluates to true if b2579 is present, but that does not matter, as the second clause evaluates to true and thus the whole formula evaluates to true. I'm not sure, why these rules exist, and computationally, they don't make any sense to me, particularily as any computation using the formula will, necessarily ignore the b2579 effect. The only difference the rule would have is in the gene association (which if b2579 is removed would no longer be there).

Now, I'm wondering, whether these kind of things should be 'corrected' during processing or not. One thing I can say is that if we use a more efficient external library for our models, those entries would be removed, so an update here depends on this. Any thoughts?

I hereby confirm that I have:

  • Checked that a similar issue has not already been opened

(Note: You may replace [ ] with [X] to check the box)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencobra/cobratoolbox/issues/1386, or mute the thread https://github.com/notifications/unsubscribe-auth/ACDCOqj0vbuXOSA4uSHQwBfbWt_tTkoeks5uxm4ugaJpZM4Yu9WP .

--

Mr. Ronan MT Fleming B.V.M.S. Dip. Math. Ph.D.

Assistant Professor, Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University. https://www.universiteitleiden.nl/en/staffmembers/ronan-fleming http://analyticalbiosciences.leidenuniv.nl & H2020 Project Coordinator Systems Medicine of Mitochondrial Parkinson’s Disease http://sysmedpd.eu

Mobile: +353 873 413 072 Skype: ronan.fleming

(This message is confidential and may contain privileged information. It is intended for the named recipient only. If you receive it in error please notify me and permanently delete the original message and any copies.)

tpfau commented 5 years ago

@rmtfleming The attachment seems to have been lost on conversion of your mail to github.

tpfau commented 5 years ago

Thanks for the mail, Yes, it has the right GPR rule according to the Figure. However, also in that figure, you can see that yfiD can be ignored as (pflA and pflB) is sufficient, and required, so yfiD is useless for the GPR, due to the direct link pflA+pflB -> PFL

akaviaLab commented 5 years ago

I can give some insight into why these might exist biologically. If you have a complex comprised out of 2 alpha unit and 2 beta units to work, the GPR would probably look like (alpha and beta) If alpha has two separate genes - alpha1 and alpha2, you might end up getting several complexes in the cell 2 alpha1, 2beta 2 alpha2, 2beta alpha1 + alpha2, 2 beta

So the GPR might end up something like (alpha1 and beta) or (alpha2 and beta) or (alpha1 and alpha2 and beta), where the last term is relatively meaningless. Does this help?