wikipathways / pathway-figure-ocr

Extracting gene sets from published pathway figures
Apache License 2.0
15 stars 2 forks source link

Provide PFOCR data for external ranking and sorting #13

Closed AlexanderPico closed 4 years ago

AlexanderPico commented 4 years ago

The gene content from pathway figures will be amenable to the prioritization approaches described in Milestone Group 3 during the first segment either by the export strategies previously mentioned or by structured formats produced directly by our group, e.g., GraphML, RDF or OWL

AlexanderPico commented 4 years ago

Pasting original draft of MG3 goals here for reference:

Milestone Group 3: Summarization, filtering, and ranking tools We will focus on two activities. First, we will enable export of BTE results via the Reasoner API standard and in GraphML format. Despite having functional implementations of query path planning and execution, the current version of BTE does not include any significant capabilities for ranking and sorting. Exporting query path results in standard formats for prioritization outside of BTE is the first step in this area. Second, we will finalize an initial Translator version of DrugMechDB, a curated database of drug mechanisms [14]. DrugMechDB currently contains ~100 drug mechanisms expressed as paths through a knowledge graph. Here, we will normalize entity and predicate types against the BioLink Model. This database will be further developed in the later stages of BTE as a gold standard for quantitatively assessing methods for ranking and sorting. (More details are provided in the "Project Milestones" document.)

Thus, the idea for MG4 was to follow suit and produce an export format for PFOCR content in particular that folks could use to perform reasoning and prioritization outside of BTE. Still a good idea?

AlexanderPico commented 4 years ago

Per discussion with Andrew, we've updated this aim. The originally proposed graph formats suffer from limited adoption and steep learning curves, making them poor choices for maximizing data access to users of BTE for external analysis/prioritzation. Let's start by providing JSON and collect feedback and requirements prior to choosing additional export formats.

AlexanderPico commented 4 years ago

PFOCR data is available in JSON format. So, this task is complete.

andrewsu commented 4 years ago

When using the biothings PFOCR API (created in https://github.com/wikipathways/pathway-figure-ocr/issues/12) this notebook has a nice demonstration of using PFOCR data in a biomedical query: https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/TIDBIT%2004%20Finding%20Marketed%20Drugs%20that%20Might%20Treat%20an%20Unknown%20Syndrome%20by%20Perturbing%20the%20Disease%20Mechanism%20Pathway.ipynb (see info at the bottom in Step 3)