x-atlas-consortia / ubkg-neo4j

A container implementation to serve the Unified Biomedical Knowledge Graph in Neo4j
MIT License
1 stars 0 forks source link

HuBMAP: UBKG support for new "soft" assay types #32

Closed AlanSimmons closed 10 months ago

AlanSimmons commented 1 year ago

General

There is a difference in overall ingestion logic.

Tests are specified by a rules engine, which is a Python package

Goals:

  1. Store rules logic to the degree possible in UBKG
  2. Obtain rule logic from UBKG

Rules engine

The Rule engine implements logic via a set of chained tests. Rules are of two types:

Tests run in order. Test results can be in various formats, including JSON. Rule logic is expressed per a syntax. Some of the returns from rules may require valuesets of some sort. The example that we discussed was the set of Vitessce hints

UBKG - ETL

Rule configuration should be in a resource external to Rules Engine. The expressed desire is to represent as a graph the rule logic decomposed to the resolution of individual element. For example, if a rule can be expressed as X = A AND (B OR C), then we would want nodes for X, A, B, and C, along with edges between X and A, A and B, etc. However, initially, we may have to store at lower resolution—e.g., a single node with "X = A AND (B OR C)". The graph design must wait for more information. We need examples of what we would be representing—i.e., output of results rules. The examples should span the possible range of returns: this means that we need to know more about the set of new datasets. UBKG ETL would parse returns from rules engine into edge (assertion) and node metadata files. Potential issue: We discussed storing some results logic information as properties of nodes. UBKG ETL assumes a certain structure for node properties—i.e., a node can only have value, lowerbound, upperbound, and unit properties. If we define new properties for nodes related to rules logic, we might need to represent these as "property nodes"--e.g., instead of a node property "color = blue", we define a blue node that isa color node and then link to the node with a "has_color" edge.  

UBKG-API

The UBKG-API will need endpoints that return results logic. At this time, we think that the primary consumer of these endpoints would be the rules engine. The UI would query the rules engine directly.   

AlanSimmons commented 10 months ago

This is a duplicate of #40