jotech / gapseq

Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks
GNU General Public License v3.0
147 stars 30 forks source link

Reactions that have good blast but not included in the draft model #119

Closed gmhhope closed 1 year ago

gmhhope commented 2 years ago

Dear Gapseq developer,

I am super excited about this software. It does improve CarveMe and ModelSeed draft models a lot and it provides a long list of tentative reactions, which provides transparency in tracking on the procedure! So thanks very much!

I recently attempts to translate the genome-scale model (the formalized knowledgebase of metabolic reactions in a bio-organism(s)) to MS-based metabolomics application. Thus, the coverage of the reference reactions and compound databases are essential to enable such application, as metabolomics aims at mapping not only essential metabolisms but secondary metabolisms.

I tested gapseq and found that there are a significant number of reactions reported in the output (~30,000 rxns), which outnumbers the reactions that are retained in the draft model (~2500 rxns). Is there any descriptions of how the software compiles and filters the reactions? Esp. why a significant of results of no_blast present in the table? Are those included no-blast-derived reactions from a particular pathway predicted to present in the model based on essential reactions identified?

Furthermore, I found some reactions that have a good-blast but not included in the draft model. For example, reactions_with_good_blast_not_in_model.txt

I did not find the reaction in ModelSeed database but in MetaCyc database. I am not sure if your reference network includes MetaCyc as well?

I have a lot more questions but I think this will be a good start to understand more about the software. Thanks very much for any assistance! And I was amazed by your software! Thanks!

Best, Minghao Gong

Waschina commented 2 years ago

Hi Minghao Gong,

thank you for you questions and for using gapseq!

As pathway references, gapseq mainly uses the pathways described in MetaCyc. The pathways IDs are stated in the column pathway in the ...-Reactions.tbl. The MetaCyc pathway definitions contain the MetaCyc reactions IDs that are giving in the column rxn. gapseq links the MetaCyc-Reaction IDs to the reactions IDs from the gapseq reactions database. Unfortunately, this mapping is not perfect, leaving a few metacyc-IDs unlinked to the gapseq reaction DB. This explains the results you provided in the file reactions_with_good_blast_not_in_model.txt. If there are hits, the column dbhit provides the reaction IDs that refer to the gapseq reaction DB and/or the ModelSEED reaction DB.

The gapseq internal reaction and compound database is derived from ModelSEED. Not all reaction entries from ModelSEED are included in the gapseq reaction database. We for instance excluded a number of duplicated or erroneous reactions. Thus, a number of hit-IDs stated in dbhit are not part of the gapseq biochemistry database and these reactions do not occur in gapseq models (draft and gapfilled models).

why a significant of results of no_blast present in the table? Are those included no-blast-derived reactions from a particular pathway predicted to present in the model based on essential reactions identified?

Yes. reactions with the label "no_blast" or "no seq data" can still be part of the draft network if these reactions participate in a pathway that was predicted to be present based on the completeness threshold and/or key-enzyme criteria.

Best Silvio

gmhhope commented 1 year ago

Thanks for closing it! I have so many notices and I probably was not able to follow this thread when your replied months before.

Now I saw your closed notice and I can come back to it!

Sorry for the late catching up!

Best, Minghao

Zhelunnn commented 8 months ago

Hi Silvio,

Thanks for the detailed explanation above. I am curious about the choice to base the gapseq database on ModelSeed instead of MetaCyc. I assumed utilizing MetaCyc might have allowed for a more extensive inclusion of reactions in the model.

Best, Zhelun

Waschina commented 8 months ago

The ModelSEED biochemistry database is very comprehensive and integrates several reaction and metabolite data sources, including MetaCyc (https://academic.oup.com/nar/article/49/D1/D575/5912569).