jotech / gapseq

Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks
GNU General Public License v3.0
159 stars 32 forks source link

Comparison of amino acid auxotrophy prediction with GapMind #220

Open mgabriell1 opened 4 months ago

mgabriell1 commented 4 months ago

Hi, First of all (once again) thanks for developing this tool! I'm opening this issue because I've noticed some discrepancies between the results of auxotrophy prediction of gapseq and GapMind and would like to know your opinion. I'm analyzing some genomes of Legionella and I've run both tools on a set of isolated genomes. I see that when GapMind predicts either "High confidence" or "Medium confidence" to a biosynthetic pathway, gapseq predicts no auxotrophy towards that amino acid (tested assessing the in-silico "growth" setting the lower bound of the exchange reaction for that amino acid to zero). So at that level the predictions match very well. However, looking at GapMind's "Low confididence" pathway (i.e., auxotrophic) I see that for serine and isoleucine gapseq predicts in most cases no auxotrophy (so contrarily to GapMind), while for the other amino acids in most cases the predictions are in agreement. Looking more in depth at gapmind models, I have noticed that for example in genome GCF_026689455.1 there are no reactions that produce irreversively L-serine, but, instead, there are several reversible ones which involve it:

"rxn00426_c0"   "L-Serine hydro-lyase"  "[c0] : cpd00054 <==> cpd00001 + cpd01495"  "Reversible"    "c0"    -1000   1000    0   ""  "|PWY-3661|, |PWY-5497|, |SERDEG-PWY|"
"rxn00953_c0"   "L-Serine hydro-lyase (adding homocysteine)"    "[c0] : cpd00135 + cpd00054 <==> cpd00001 + cpd00424"   "Reversible"    "c0"    -1000   1000    0   "EPOCDL_14790"  "|HOMOCYSDEGR-PWY|, |PWY66-426|, |PWY-801|"
"rxn03380_c0"   "L-Serine hydro-lyase (adding homocysteine)"    "[c0] : cpd00054 + cpd03397 <==> cpd00001 + cpd03398"   "Reversible"    "c0"    -1000   1000    0   "EPOCDL_14790"  "|HOMOCYSDEGR-PWY|, |PWY66-426|, |PWY-801|"
"rxn05307_c0"   "TRANS-RXNBWI-115637.ce.maizeexp.SER_SER"   "cpd00067[e0] + cpd00054[e0] <==> cpd00067[c0] + cpd00054[c0]"  "Reversible"    "c0, e0"    -1000   1000    0   "EPOCDL_04945 | EPOCDL_11420"   ""
"rxn05909_c0"   "L-serine hydro-lyase (adding hydrogen sulfide, L-cysteine-forming)"    "[c0] : cpd00067 + cpd00054 + cpd00239 <==> cpd00001 + cpd00084"    "Reversible"    "c0"    -1000   1000    0   "EPOCDL_14790"  "|HOMOCYSDEGR-PWY|, |PWY66-426|, |PWY-801|"
"rxn06446_c0"   "L-Serine:tRNA(Ser) ligase (AMP-forming)"   "[c0] : cpd00002 + cpd00054 + cpd11921 <==> cpd00012 + (2) cpd00067 + cpd00018 + cpd12132"  "Reversible"    "c0"    -1000   1000    0   "EPOCDL_02590"  "|PWY0-901|, |PWY-6281|, |TRNA-CHARGING-PWY|"
"rxn09247_c0"   "RXN0-4083.cp"  "cpd00971[e0] + cpd00054[e0] <==> cpd00054[c0] + cpd00971[c0]"  "Reversible"    "c0, e0"    -1000   1000    0   "EPOCDL_04945 | EPOCDL_11420"   ""
"rxn15167_c0"   "L-serine hydro-lyase (adding homocysteine; L-cystathionine-forming)"   "[c0] : cpd00135 + cpd00054 <==> cpd00001 + cpd19019"   "Reversible"    "c0"    -1000   1000    0   "EPOCDL_14790"  "|HOMOCYSDEGR-PWY|, |PWY66-426|, |PWY-801|"
"EX_cpd00054_e0"    "L-Serine-e0 Exchange"  "[e0] : cpd00054 <==> " "Reversible"    "e0"    -0.1    1000    0   ""  ""
"rxn00425_c0"   "serine racemase"   "[c0] : cpd00054 <==> cpd00550" "Reversible"    "c0"    -1000   1000    0   ""  ""

So I wonder whether this result makes biological sense or it is just something that it is theoretically possible, but not realistic. I know this issue does not necessarily regards gapseq, but maybe you have some experience for cases like these. In any case, thanks!

Waschina commented 4 months ago

Hi Marco,

this is an interesting case. I reconstructed the model for "GCF_026689455.1" and also predicted that serine is not an auxotrophy.

I checked which reactions are involved in serine biosynthesis. Apparently, parts of the folate cycle are used in the reverse direction to produce serine from glycone, where formate is used as a C1 group donor.

           rxn      ec        flux                                                                                                                   equation     status pathway.status
1: rxn00690_c0 6.3.4.3  0.08615402 (1) ATP-c0 + (1) Formate-c0 + (1) Tetrahydrofolate-c0 --> (1) Phosphate-c0 + (1) ADP-c0 + (1) 10-Formyltetrahydrofolate-c0   no_blast      threshold
2: rxn00692_c0 2.1.2.1  0.02345205         (1) H2O-c0 + (1) Glycine-c0 + (1) 5-10-Methylenetetrahydrofolate-c0 <==> (1) L-Serine-c0 + (1) Tetrahydrofolate-c0 good_blast           <NA>
3: rxn00907_c0 1.5.1.5 -0.02565517               (1) NADP-c0 + (1) 5-10-Methylenetetrahydrofolate-c0 <==> (1) NADPH-c0 + (1) 5-10-Methenyltetrahydrofolate-c0 good_blast           <NA>
4: rxn01211_c0 3.5.4.9 -0.02565517                        (1) H2O-c0 + (1) 5-10-Methenyltetrahydrofolate-c0 <==> (1) H+-c0 + (1) 10-Formyltetrahydrofolate-c0 good_blast           <NA>

I am not sure if that path to serine biosynthesis is biologically realistic. In any way, this explains the difference to GapMind as this route is not checked by GapMind as serine biosynthesis pathway.

Side note: the reaction "rxn00690_c0" was not found in the genome but is part of the model because of a pathway completeness threshold.

mgabriell1 commented 4 months ago

Hi Silvio, Thanks a lot for the quick reply! Basically gapseq is saying that this could be "stoichiometrically possible", even though, let's say, not traditional (e.g., this pathway is not included in GapMind). I guess this opens up work to verify this experimentally ;) Thanks again for the help,

Marco

mgabriell1 commented 4 months ago

Sorry for the additional comment, but I was wondering how did you get that nicely-formatted reactions table. I guess it would also be helpful for other gapseq users :) Thanks!

Marco