BridgeDb identifier mapping files

ariutta commented 2 years ago

In the biweekly call today, we discussed creating a JSON file per pathway with identifier mappings. For example, we could have a JSON file WP1-identifiers.json in the WP1 folder of wikipathways-assets. The identifiers could map from the original used in the pathway to a selected set of mappings like Ensembl, Entrez Gene, Wikidata and HGNC preferred name for gene products and ChEBI and Wikidata for metabolites.

@egonw also discussed possibly creating subset BridgeDb database files for all identifiers used in the WikiPathways approved collection.

This work could build on the work by @hbasaric.

AlexanderPico commented 2 years ago

@hbasaric Helena, I believe you already have this working in your GH Action for homology mapping. The initial ID unification step that you do (prior to homology mapping) would be useful for lots of use case. Maybe you could split your current workflow and have a separate action that just performs the ID unification and saves it to the wp-database repo as a JSON or TSV file.

Then your homology mapping action would simply read in this file and continue with subsequent steps.

Anders' SVG generation action would read the same file and not have to pull down derby files and repeat the same unification step.

Other users often want unified ID as well, so we could provide that as a downloadable file for other use cases.

Overall, this would save a lot of redundant bridgedb lookups per pathway.

AlexanderPico commented 2 years ago

Also, the unification step should first read the previous unification file and use it as a cache so only new nodes have to be queried. This would save ~10min of GH Action time for every minor edit to an existing GPML.

wikipathways / wikipathways-development

BridgeDb identifier mapping files #65