Closed almutwerner closed 2 years ago
Hi and sorry for the late response!
Subsystem information. In gapseq, the subsystems are the MetaCyc pathways that are tested in gapseq find
. Here's a small R-code-snippet to extract the subsystem information from the gapseq model RDS file:
library(sybil)
library(data.table)
mod <- readRDS("model_files/anha.RDS")
tmpind <- which(mod@subSys, arr.ind = T)
subsysDT <- data.table(react.id = mod@react_id[tmpind[,1]],
react.name = mod@react_name[tmpind[,1]],
subsystem = colnames(mod@subSys)[tmpind[,2]])
The subsystem IDs correspond to the MetaCyc pathway IDs.
Reaction equations. Yes, in principle you will find all reactions that can be part of a gapseq model in the table seed_reactions_corrected.tsv
. Yet, I need to add that this file is meant as internal program file to gapseq and some columns in this table, which gapseq doesn't need anymore, are not updated by us anymore. The table was originally derived from ModelSEED's biochemistry DB and we are going to restructure this file in gapseq rather soon. Long story short, I would recommend a different approach to retrieve reaction equations:
Continuing the script above, you can add a new column with the reaction equation to the subsysDT
data table. To do so, please load this function in your R-environment: https://gist.github.com/Waschina/6fd9d7ac7509d8060477cc25d4918617
Next, it is a one-liner to add the equations:
subsysDT$equation <- print_reaction(mod, subsysDT$react.id)
I hope this helps :)
Silvio
Thank you for your answer! It worked well with my data. I do have some followup questions though:
-How are the subsystem IDs and the pathway IDs from -all-Pathways.tbl related? I noticed at least a few of them are the same, but I haven't checked for all of them. My goal is to assign a superpathway (where present) to all of the pathways in the -all-Pathways.tbl. So is it, that when the IDs are the same, it is already a superpathway, and if not, I can match it using the reaction ids?
-Just so I definitely got it right, the reactions with --> and <-- just work in that direction (so A --> B is the same reaction as B <-- A), the ones with <==> work in both?
I'm happy to hear it worked!
The subsystem-IDs should be a subset of all the pathway IDs in the [...]-all-Pathways.tbl
. When you say superpathway, do you mean the Superpathways as defined by MetaCyc or the individual pathways parent classes? In any case, I think @jotech might have a recommendation to achieve the aggregation of pathways to larger groups.
About the arrows: Exactly.
I think that will be the parent classes. But any ways to aggregate pathways into groups would be appreciated really.
The react.id is the same as the ids in column dbhit of -all-Reactions.tbl, right? But some reactions from the .tbl don't have dbhits, or the ids don't match with the one from the RDS, or they have multiple. Is there a way to determine the reaction equation for each reaction of -all-Reactions.tbl?
Yes, react.id
directly corresponds to the reaction ids listed in the dbhit
column. But please note, that not all reactions in the dbhit
column will also occur in the final model. This is because gapseq is not using the entire model seed reaction database for model construction, but a curated subset.
dbhit
: This is because gapseq matched reactions primarily by the EC code. As enzymes often can catalyse several reactions, gapseq tries to link all of the known catalyzed reactions to each EC.dbhit
: These are cases, where we do not have a corresponding reaction within the gapseq subset of modelseed's reaction database. Or where we did not yet associate the corresponding gapseq reaction with the correct EC. We are constantly working on gapseq to minimize the number of these cases. So if you find specific 'no hit' cases, that should be resolved, please let us know and we'll try to fix it.dbhits
do not match reaction ids in models: Would you have a specific example that I could look into?Reaction equation for each reaction in *-all-Reactions.tbl
: Each row in this table refers to a metaCyc reaction (with a few exceptions of gapseq-custom pathways) and the their reaction equations can probably retrieved from metaCyc.
Hi @almutwerner my answer comes a bit late, sorry! I just want to add to what Silvio wrote. You asked about the pathways and how they can be grouped.
My recommendation would be to use the metacyc pathway ontology. The corresponding subgroups for each pathway can be found in the column named hierarchy
found in the table gapseq/dat/meta_pwy.tbl
. The subgroups can then be used to combine pathways and have an integrated look.
Thanks, I'll give it a try!
cool! let us know if there is still something unclear :)
Hello gapseq team!
-I read in the paper that gapseq was also assigning the subsystems/superpathways for each pathway and I was wondering where in output or internal database files to find that information.
I also want to look into the order of reactions within each pathway and additionally reconstruct a network based on their equations. While I was looking for this information, I came across this https://raw.githubusercontent.com/jotech/gapseq/master/dat/seed_reactions_corrected.tsv . -Are those all of the reactions, that are also in the output-Reactions.tbl, or are some missing? -What is the difference between the Columns "code" and "equation"? E.g. for rxn00008, one of them contains the H+, one doesn't. Also the arrows are sometimes different. -Which columns can I use to identify the possible directions of this reaction? I assume either "reversibility" or "direction", but I don't quite understand how to interpret them and what the difference is.