wikipathways / rWikiPathways

R package for WikiPathways API
https://r.wikipathways.org/
MIT License
14 stars 6 forks source link

How to extract just metabolic subset of genes? #34

Open avelar-ageing opened 6 months ago

avelar-ageing commented 6 months ago

I am interested in downloading metabolic enzymes from pathways. For example in the omega3 senescence pathway (https://www.wikipathways.org/pathways/WP5424.html) there are various genes that are not directly linked to metabolism, including p21. I think it it should be possible to identify metabolism genes using all genes involved in conversion MIM interactions? Is there a method of just extracting these genes as opposed to all genes in the pathway using the R package?

Thanks

egonw commented 6 months ago

@DeniseSl22, didn't we write a SPARQL query for this at some point in time? Or was that just on my long wish-/todo list?

egonw commented 6 months ago

The pathway WP5424 is not in the RDF yet, but the following SPARQL should give you some idea how to do this:

SELECT ?wpid ?catalyst ?source ?target WHERE {
  ?pathway a wp:Pathway ;
      dc:identifier / dcterms:identifier ?wpid .
  ?catalysis a wp:Catalysis ;
    dcterms:isPartOf ?pathway ;
    wp:source / rdfs:label ?catalyst ;
    wp:participants ?reaction .
  ?reaction a wp:Interaction .
  OPTIONAL { ?reaction wp:source ?source }
  OPTIONAL { ?reaction wp:target ?target }
} ORDER BY ASC(?catalysis)
DeniseSl22 commented 6 months ago

@avelar-ageing , thanks for your question! I've modified the query of @egonw slightly, see below.

I believe that the reactions without a clear source and/or target are not relevant in this case (and require some curation on our side). There are also a bunch of interactions between two metabolites which have not been drawn with the MIM-Catalysis interaction type, but with a regular arrow. I've reworked that line in the SPARQL query (see below), so you can comment it out to see the difference in response (# is used for comments in SPARQL). When only including interactions of type MIM:Catalysis, you would receive 5296 results; if commenting out this line, you get 6189 results (so ~900 more). I've also added a way to unify to one database type (Wikidata, others are possible, e.g. HMDB, ChEBI, PubChem) for the metabolite annotations, in case you would want to merge the data at a later stage. Unifying the enzyme annotations can be done in a similar matter (to HGNC, Ensembl, UniProt, etc.)

Also note that this is for all pathway (WikiPathways and Reactome) and all species. Hope the above helps, if not ask another question here.

SELECT DISTINCT ?wpid ?catalyst ?source ?sourceDb ?target ?targetDb WHERE {
  ?pathway a wp:Pathway ;
      dc:identifier / dcterms:identifier ?wpid .
 # ?catalysis a wp:Catalysis .
  ?catalysis dcterms:isPartOf ?pathway ;
    wp:source / rdfs:label ?catalyst ;
    wp:participants ?reaction .
  ?reaction a wp:Interaction .
  ?reaction wp:source ?source .
  ?source a wp:Metabolite . 
  OPTIONAL{?source wp:bdbWikidata ?sourceDb .}

  ?reaction wp:target ?target .
  ?target a wp:Metabolite . 
  OPTIONAL{?target wp:bdbWikidata ?targetDb .}
} ORDER BY ASC(?source)