old method: denormalized dataset data type (assay_type) associated with a workflow—e.g., assay_1 uses workflow_1
new method: data type derives from a base type and conditions determined from the metadata—e.g., assay_1 is based on assay_0 + {has metadata 1 value=X and metadata value 2 = Y}
Tests are specified by a rules engine, which is a Python package
Goals:
Store rules logic to the degree possible in UBKG
Obtain rule logic from UBKG
Rules engine
The Rule engine implements logic via a set of chained tests. Rules are of two types:
"Get" - returns the result of a query
"Set" - modifies state for rule
Very high-level example of rule.
Test 1: Is this dataset from the new (CEDAR) schema, or the old (pre-CEDAR) schema?
test 1 logic
test 1 result that sets state—e.g., old_style=true
Test 2: Old style assay type
test 2 logic - uses state (if old_style=true...)
test 2 result: old assay type
Test 3: New style template
test 3 logic - uses combination of state (if old_style=false) and metadata (e.g., template name = x)
test 3 result: new assay type
Tests run in order.
Test results can be in various formats, including JSON.
Rule logic is expressed per a syntax.
Some of the returns from rules may require valuesets of some sort. The example that we discussed was the set of Vitessce hints
UBKG - ETL
Rule configuration should be in a resource external to Rules Engine.
The expressed desire is to represent as a graph the rule logic decomposed to the resolution of individual element. For example, if a rule can be expressed as X = A AND (B OR C), then we would want nodes for X, A, B, and C, along with edges between X and A, A and B, etc. However, initially, we may have to store at lower resolution—e.g., a single node with "X = A AND (B OR C)".
The graph design must wait for more information. We need examples of what we would be representing—i.e., output of results rules. The examples should span the possible range of returns: this means that we need to know more about the set of new datasets.
UBKG ETL would parse returns from rules engine into edge (assertion) and node metadata files.
Potential issue: We discussed storing some results logic information as properties of nodes. UBKG ETL assumes a certain structure for node properties—i.e., a node can only have value, lowerbound, upperbound, and unit properties. If we define new properties for nodes related to rules logic, we might need to represent these as "property nodes"--e.g., instead of a node property "color = blue", we define a blue node that isa color node and then link to the node with a "has_color" edge.
UBKG-API
The UBKG-API will need endpoints that return results logic. At this time, we think that the primary consumer of these endpoints would be the rules engine. The UI would query the rules engine directly.
General
There is a difference in overall ingestion logic.
Tests are specified by a rules engine, which is a Python package
Goals:
Rules engine
The Rule engine implements logic via a set of chained tests. Rules are of two types:
Very high-level example of rule.
Tests run in order. Test results can be in various formats, including JSON. Rule logic is expressed per a syntax. Some of the returns from rules may require valuesets of some sort. The example that we discussed was the set of Vitessce hints
UBKG - ETL
Rule configuration should be in a resource external to Rules Engine. The expressed desire is to represent as a graph the rule logic decomposed to the resolution of individual element. For example, if a rule can be expressed as X = A AND (B OR C), then we would want nodes for X, A, B, and C, along with edges between X and A, A and B, etc. However, initially, we may have to store at lower resolution—e.g., a single node with "X = A AND (B OR C)". The graph design must wait for more information. We need examples of what we would be representing—i.e., output of results rules. The examples should span the possible range of returns: this means that we need to know more about the set of new datasets. UBKG ETL would parse returns from rules engine into edge (assertion) and node metadata files. Potential issue: We discussed storing some results logic information as properties of nodes. UBKG ETL assumes a certain structure for node properties—i.e., a node can only have value, lowerbound, upperbound, and unit properties. If we define new properties for nodes related to rules logic, we might need to represent these as "property nodes"--e.g., instead of a node property "color = blue", we define a blue node that isa color node and then link to the node with a "has_color" edge.
UBKG-API
The UBKG-API will need endpoints that return results logic. At this time, we think that the primary consumer of these endpoints would be the rules engine. The UI would query the rules engine directly.