ontox-hu / aspis4j

an api for the ontox4j graph database
0 stars 0 forks source link

property-record nodes & relationship factors #28

Open tomlue opened 2 years ago

tomlue commented 2 years ago

Tasks

    • [ ] add property-records to graph (or explain why not)
    • [x] migrate to factor graph (or explain why not)

A. Add property-record nodes that assign a property+value to other nodes and reference sources? It would:

  1. allow many collaborators on the graph.
  2. allow fine graned access control on uploaded properties.
  3. allows storage of conflicting values
  4. allows storage of multiple values for the same property from different sources

B. Should we use 'factor relationships'? Our relationships don't handle multiple inputs/outputs, can't be the target of other relationships and can't be associated with property records. This approach won't work for anything but trivial relationships. For example the following statements cannot be well captured:

a. Gefitinib promotes the reaction "EGFR protein binds to EGFR protein" b. [TGFA protein binds to and affects the activity of EGFR protein] which decreases susceptibility to nickel sulfate c. X binds to EGFR with affinity Z

In a,b we want to say that something transforms the relationship between two other things. In c, we want to add a quantitative value to a relationship, but there may be disagreements about what the affinity is.

Factor relationships solve this, for example:

factor-node:
   label: binds-to
   identifier: 100
   ligand: TGFA
   target: EGFR

variable-node:
   label: protein
   name: TGFA

variable-node:
   label: protein
   name: EGFR

relationship:
   input: TGFA
   output: factor_node_100

relationship:
   input: factor_node_100
   output: EGFR

...

This approach creates a bipartite factor graph where:

  1. Every node is either a variable or a factor.
  2. Variables can relate to Factors but not other variables
  3. Factors can relate to variables but not other factors

image

Factor graphs should be able to capture very complex relationships, and support advanced modeling methods.

Maddocent commented 2 years ago

The way I see this, and thanks for these helpful examples @tomlue, is that indeed there is no easy way to capture this is direct node - edge - node realationships like we have been trying so far. I see no reason why not to adopt this. But indeed, some more feedback from others: @Huan-Yang @amdehaan , would be great!

Huan-Yang commented 2 years ago

Thanks for doc and the example @tomlue . Just curious about two things (i) how to incorporate Z in c; (ii) for "B. Should we use 'factor relationships'?", does the proposed approach also work for the situation with five species (a,b,c,d,e): for a reaction a + b -> c + d , and e promotes the reaction?

MarieCo commented 2 years ago

I like the idea of a factor graph, thanks for the examples @tomlue. A few thoughts/questions:

I realise that not everything is directly related to this issue and some are more general questions, so please let me know if I should move some to a different (new?) issue.

JJSirius commented 2 years ago

I hesitated to recommend a factor graph due to the increase of nodes and relationships, but now I think the queries will be cleaner and improve the performance. https://neo4j.com/developer/modeling-designs/

tomlue commented 2 years ago

Hey neat, a hyperedge in those docs seems like the same thing as a factor.

tomlue commented 2 years ago

@MarieCo responded to your queries (1 through 4) below. Created issues for proposed solutions.

How to prevent bad property assignments? (query 1 & 2) We can restrict new label and property creation with Neo4j fine-grained-access control and constraints.

How do we want to deal with synonyms? (query 3) A possible solution, where * indicates any node, is:

Variable Node `Synonym`
Function Node `Synonym_assignment`
Relation Edge `has_synonym : Synonym_assignment -> Synonym` 
Relation Edge `has_target : Synonym_assignment -> *`

This adds one layer to

Variable Node `Synonym`
Relation Edge `has_synonym : * -> Synonym` 

image The latter generalizes more and would allow additional relations like confidence or source.

There's a lot to think about here, I'm not ready to assign issues on it. A paper on the RDF OWL property owl:sameAs shows how hard this can be, When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web. Identity links are really hard to get right, and synonyms are a related concept.

How do factors handle multiple relationships i.e a+b -> c+d ? (query 4) F(A,B,C,D) is a four argument factor node (maybe need dif. name, see 2) where

  1. a -A-> F
  2. b -B-> F
  3. c -C-> F
  4. d -D-> F

image

F above doesn't really have an input and output. The edges are better defined as undirected. The relationship labels tell you what role each related node plays in the F instance.

You can also think of F as a table associating instances of A, B, C and D. A B C D
a1 b32 c9 d5
a2 b12 c3 d8

Some extra thoughts:

  1. I am not certain that a factor graph is the best solution, but it does enforce lightweight relationships.
  2. The above isn't really a factor graph. Function graph might be better and call factors function-nodes or fnodes.
  3. @JJSirius reference to Neo4j hyperedge indicates that transforming edges into nodes is a common thing.
  4. https://medium.com/neo4j/graph-data-modeling-all-about-relationships-5060e46820ce @Maddocent referenced relationship reification which helps to motivate lightweight edeges as well
  5. https://www.w3.org/TR/rdf-schema/#ch_properties The RDF-Schema documentaiton is dense, but worth reading carefully. RDF statements link a subject, predicate and object. The subject and object are classes. The predicate is a property. I didn't realize previously that RDF properties are meant to be quite light weight.
  6. https://docs.ropensci.org/rdflib/articles/rdf_intro.html This helped me see RDF as a way of collapsing tabular data to a special long form. It also pretty much sold me on the idea of RDF maybe actually being useful (maybe).