wikipathways / cytoscape-wikipathways-app

WikiPathways app for Cytoscape to open and access pathways from WikiPathways
Apache License 2.0
5 stars 7 forks source link

Add standardizes metabolite IDs in one column to resolve data mapping issues #123

Closed DeniseSl22 closed 2 years ago

DeniseSl22 commented 2 years ago

Since the build in BridgeDb mapping does not support metabolite mappings (which would be a separate feature request not related to this app), it would be nice to have an additional column for metabolite IDs (just as we have this now for Ensembl): image

I would propose to use either Wikidata or ChEBI as this "universal mapping resource", and then we could work from there. Note: ChEBI does not include a ton of lipid IDs, which might complicate some other work we are doing. Note 2: Wikidata (or any other metabolite database) is not complete, so will not work for each ID; however we can make updates to Wikidata ourselves, which make life a bit easier from time to time. Also, most people do not annotate their data with Wikidata IDs (yet), and repositories for metabolomics data require people to add their data as for example ChEBI or HMDB (but that's something people could use the BridgeDb plugin for). Note 3: Mappings might lead to two mappings for one ID (I do regular checks on these in our RDF, to remove redundancies, though). Note 4: Another relevant mapping might be "InChI-Key", which should be available for most of the IDs we have (but often requires people that know how to program to add their data to it, which is not what we're aiming for in the GUI of Cytoscape I believe).

Hope I explained my request clear enough, if not please let me know @mkutmon @egonw @AdamStuart @AlexanderPico

AlexanderPico commented 2 years ago

Notes for devs: you might use the same approach used to unify multiple ID sources to Ensembl: https://github.com/wikipathways/cytoscape-wikipathways-app/blob/main/src/main/java/org/wikipathways/cytoscapeapp/internal/cmd/EnsemblIdTask.java

  1. collect IDs for all datanodes of type Metabolite per source
  2. make xrefsbatch query per source to ChEBI
  3. set values by row per query result

Thus, if a given pathway contains metabolites specified by 3 different sources, the code would run 3 different batch queries and set the values in a new table column in 3 rounds.

DeniseSl22 commented 2 years ago

Resolved in version 3.3.10, added ChEBI ID mapping column.