linkml / linkml-model

Link Modeling Language (LinkML) model
https://linkml.github.io/linkml-model/docs/
33 stars 16 forks source link

Add `value_presence` slot to `slot_expression` #126

Closed pkalita-lbl closed 1 year ago

pkalita-lbl commented 1 year ago

Similar to slots like equals_string and equals_number, the value_presence slot would be useful for describing class rules, i.e. expressing rules like "if slot a has any value, then slot b must also have a value".

ddooley commented 1 year ago

I don't mean to introduce complexity, but I'm just putting down here the possibility of having a categorical "slot status" that covers this. It could have a "value present" value, but as well: https://www.insdc.org/submitting-standards/missing-value-reporting/

image

At moment for NCBI Biosample submission we have it that a field in addition to whatever data type it is, can have an extra categorical list of INSDC metadata values in its range. But such metadata would be shunted into the kind of slot/ field you propose before going into a database. Rules could then be based on these values too, like if field x is missing, don't make field y required.

cmungall commented 1 year ago

This issue is about a metamodel predicate that allows us to do the equivalent of is null / is not null checks as part of conditional rule evaluation

I think the enums that we might want to encode missing values in sample data is a different use case, but we can look at this. I think the two-value tuple with conditional logic you describe could be represented as:

classes:
  Sample:
    age:
      range: integer
    age_collection_missing_status:
      range: MissingValueEnum
    rules:
       ## pseudocode:
       ##
       ## IF age_collection_missing_status IS NULL:
       ##.   THEN age is required
       preconditions:
           slot_conditions:
              age_collection_missing_status:
                 value_presence: ABSENT                 
       preconditions:
           slot_conditions:
              age:
                  required: true

this is a little unintuitive/meta at first as we are mixing the concept of null values at different levels

it might be more untuitive to force an explicit collection status:

classes:
  Sample:
    age:
      range: integer
    age_collection_status:
      range: CollectionStatusEnum
      required:
    rules:
       preconditions:
           slot_conditions:
              age_collection_status:
                 equals_string: "COLLECTED"            
       preconditions:
           slot_conditions:
              age:
                  required: true

yet another way:

classes:
  Sample:
    age:
      any_of:
        - range: integer
        - range: MissingDataReasonEnum

which has a straightforward mapping to something like python (Union) but introduces a bit of a mismatch with mapping to a stringly typed relational database representation