paleobot / pbot-dev

Codebase and initial design documents for pbot client
MIT License
2 stars 2 forks source link

Redesign of OTU -> Description -> Specimen relationships #59

Closed aazaff closed 1 year ago

aazaff commented 2 years ago

In pursuing discussing related to #58, we have decided to redesign how OTU's, Descriptions, and Specimens relate to each other

Basic Design

  1. New node type called OTU.
  2. OTU nodes attach to SPECIMEN nodes by EXAMPLE_OF relationship
    1. one-to-many
    2. NOT NULL (i.e., OTU must have at least one EXAMPLE_OF relationship)
  3. OTU node attaches to SPECIMEN node by HOLOMORPHOTYPE_OF relationship (not yet implemented)
    1. one-to-one
    2. NOT NULL (i.e., OTU must have a HOLOMORPHOTYPE_OF relationship)
  4. OTU nodes can attach to OCCURRENCE nodes (not yet implemented, see #50 and #51) via an OCCURS_IN relationship
  5. SPECIMEN nodes are attached to DESCRIPTION nodes by DESCRIBED_BY relationship
    1. many-to-one (i.e., can have multiple specimens pointing to same description
    2. Can be NULL (i.e., a specimen does not need to be attached to a description)
  6. An OTU must have AT LEAST ONE EXAMPLE_OF relationship to a SPECIMEN that has AT LEAST ONE DESCRIBED_BY relationship to a DESCRIPTION node (i.e., an OTU node must be indirectly linked to at least one description node).
  7. What were formerly called OTU description no longer exist in the database. An equivalent concept is now dynamically generated (i.e., a view) by taking a UNION of all DESCRIPTION nodes linked to the OTU (via specimens).
  8. There is a one-to-one relationship between DESCRIPTION node and SCHEMA.
aazaff commented 2 years ago

Regarding point 6 above:

An OTU must have AT LEAST ONE EXAMPLE_OF relationship to a SPECIMEN that has AT LEAST ONE DESCRIBED_BY relationship to a DESCRIPTION node (i.e., an OTU node must be indirectly linked to at least one description node).

How strongly do we feel about this rule. What if someone wants to say that certain specimens belong to an OTU but don't describe those specimens or define that OTU in any way?

NoisyFlowers commented 2 years ago

It's a bit janky, but here's the shared scribble we all created as we developed this. The bottom left portion and the diamond in the center represent the overall approach described above.

Screenshot 2022-05-18 142628

NoisyFlowers commented 2 years ago

I'm not completely sure how we'll want to query this new layout yet. But here's a stab at a grand query that builds a virtual description complex around an OTU, bringing in all the character instances of the specimen descriptions that hang off it.

match
    (otu:OTU {name:"Cornus hyperborea"})<-[:EXAMPLE_OF]-(specimen:Specimen)-[:DESCRIBED_BY]->(d:Description)-[:DEFINED_BY]->(ci:CharacterInstance),
    (ci)-[:INSTANCE_OF]->(c:Character),
    (ci)-[:HAS_STATE]->(s:State) where s.name <> "quantity"
with 
    distinct c, collect(distinct s) as states, otu
unwind states as s 
    with
        otu, c, s,
        apoc.create.vNode(["vCharacterInstance"]) as ci2
    with otu, ci2, c, s,
        apoc.create.vRelationship(otu, "vDEFINED_BY", {}, ci2) as definedBy,
        apoc.create.vRelationship(ci2, "vAPPLICATION_OF", {name:c.name}, c) as applicationOf,
        apoc.create.vRelationship(ci2, "vHAS_STATE", {name: s.name}, s) as hasState
return
    otu, definedBy, ci2, applicationOf, hasState, c, s

union

match
    (otu:OTU {name:"Cornus hyperborea"})<-[:EXAMPLE_OF]-(specimen:Specimen)-[:DESCRIBED_BY]->(d:Description)-[:DEFINED_BY]->(ci:CharacterInstance),
    (ci)-[:INSTANCE_OF]->(c:Character),
    (ci)-[hs:HAS_STATE]->(s:State) where s.name = "quantity"
with 
    distinct c, collect(s) as states, hs, otu
unwind states as s 
    with
        otu, c, s, hs,
        apoc.create.vNode(["vCharacterInstance"]) as ci2
    with otu, ci2, c, s, hs,
        apoc.create.vRelationship(otu, "vDEFINED_BY", {}, ci2) as definedBy,
        apoc.create.vRelationship(ci2, "vAPPLICATION_OF", {name:c.name}, c) as applicationOf,
        apoc.create.vRelationship(ci2, "vHAS_STATE", {name: s.name, value: hs.value}, s) as hasState
return
    otu, definedBy, ci2, applicationOf, hasState, c, s  

The first union clause creates a virtual graph of the non-quantity character instances. The second adds the quantity instances. The resulting graph looks like this:

Screenshot 2022-05-25 142848

Most likely, a version of this query will be incorporated into a graphql query resolver for our api.

NoisyFlowers commented 2 years ago

These are the latest incarnations of description queries on the new OTU structure. Explanations in the embedded comments.

For holotype descriptions:

//This captures the states of holotypes, maintaining HAS_STATE order        
match
    (otu:OTU {name:"Cornus hyperborea"})<-[:HOLOTYPE_OF]-(specimen:Specimen)-[:DESCRIBED_BY]->(d:Description)-[:DEFINED_BY]->(ci:CharacterInstance),
    (ci)-[:INSTANCE_OF]->(c:Character),
    (ci)-[hs:HAS_STATE]->(s:State) 
with
    c, s{.*, value: hs.value, order: hs.order} as states, otu //tuck the order and value relationship properties in temp object with state for use later
unwind states as s 
    with
        otu, c, s, 
        apoc.create.vNode(["vCharacterInstance"]) as ci2
        match (state:State {pbotID:s.pbotID})  //get the actual state object we want to work with
    with otu, ci2, c, s, state,
        apoc.create.vRelationship(otu, "vDEFINED_BY", {}, ci2) as definedBy,
        apoc.create.vRelationship(ci2, "vAPPLICATION_OF", {name:c.name}, c) as applicationOf,
        apoc.create.vRelationship(ci2, "vHAS_STATE", {name: s.name, value: s.value, order: s.order}, state) as hasState
return
    otu, definedBy, ci2, applicationOf, hasState, c, state  

For combined descriptions of all example specimens:

//This captures the states of all example specimens (including holotype), maintaining HAS_STATE order
match
    (otu:OTU {name:"Cornus hyperborea"})<-[:EXAMPLE_OF]-(specimen:Specimen)-[:DESCRIBED_BY]->(d:Description)-[:DEFINED_BY]->(ci:CharacterInstance),
    (ci)-[:INSTANCE_OF]->(c:Character),
    (ci)-[hs:HAS_STATE]->(s:State) 
with
    c, s{.*, value: hs.value, order:avg(hs.order)}, otu //tuck the order and value relationship properties in temp object with state for use later. For order, we want to save the average value for this state. The client can then decide what they want to do with it.
with 
    distinct c, collect(distinct s) as states,  otu //because there may be multiple descriptions here, we need to add distinct to both character and state to avoid duplicate character instances in the virtual graph
unwind states as s 
    with
        otu, c, s, 
        apoc.create.vNode(["vCharacterInstance"]) as ci2
        match (state:State {pbotID:s.pbotID})  //get the actual state object we want to work with
    with otu, ci2, c, s, state,
        apoc.create.vRelationship(otu, "vDEFINED_BY", {}, ci2) as definedBy,
        apoc.create.vRelationship(ci2, "vAPPLICATION_OF", {name:c.name}, c) as applicationOf,
        apoc.create.vRelationship(ci2, "vHAS_STATE", {name: s.name, value: s.value, order: s.order}, state) as hasState
return
    otu, definedBy, ci2, applicationOf, hasState, c, state  
NoisyFlowers commented 2 years ago

Paring the EXAMPLE_OF query above down further. Annotations in comments.

//This captures the states of all example specimens (including holotype), maintaining HAS_STATE order
match
    (otu:OTU {name:"Cornus hyperborea"})<-[:EXAMPLE_OF]-(specimen:Specimen)-[:DESCRIBED_BY]->(d:Description)-[:DEFINED_BY]->(ci:CharacterInstance),
    (ci)-[:INSTANCE_OF]->(c:Character),
    (ci)-[hs:HAS_STATE]->(s:State) 
with
    distinct c, s{.*, value: hs.value, order:avg(hs.order)} as states, otu //Tuck the order and value relationship properties in temp object with state for use later. For order, we want to save the average value for this state. By aggregating on order, we also limit s to distinct states, so we don't need to specify DISTINCT. 
unwind states as s 
    with
        otu, c, s,
        apoc.create.vNode(["vCharacterInstance"]) as ci2
        match (state:State {pbotID:s.pbotID})  //s is a temp object. We need to get the actual state node we want to work with
    with otu, ci2, c, s, state,
        apoc.create.vRelationship(otu, "vDEFINED_BY", {}, ci2) as definedBy,
        apoc.create.vRelationship(ci2, "vAPPLICATION_OF", {name:c.name}, c) as applicationOf,
        apoc.create.vRelationship(ci2, "vHAS_STATE", {name: s.name, value: s.value, order: s.order}, state) as hasState
return
    otu, definedBy, ci2, applicationOf, hasState, c, state  
NoisyFlowers commented 2 years ago

This should be mostly working with the following commits:

neo4j-graphql-js paleobot/neo4j-graphql-js@09b3a22e7a7e59ad21ab575f85db1b124dfdef4f through paleobot/neo4j-graphql-js@4b2be0fd0148925fab2217dc2e73319f7e26ee28

pbot-dev 5be73d0ac0b0d17ece6935304a5cf58ed6736145 through b7e0f4598ad9b62a53ec8d86aebeccff4980450d

pbot-api paleobot/pbot-api@e8ea57cb7f6d8ff86c0efbacd41408789f0be4f4 through paleobot/pbot-api@59a1e8860cae00804a6506b2a327a80bd1a99e03

pbot-client paleobot/pbot-client@42acdff60ce6c106a3a1c93ba65484dbd1cbe4d4 through paleobot/pbot-client@0834b21cc40445d6101536e5253118b6493bb537

This is all running on dev.