psychoinformatics-de / datalad-concepts

Other
3 stars 2 forks source link

Properties should get provenance #181

Open jsheunis opened 2 weeks ago

jsheunis commented 2 weeks ago

This came up in the context of https://hub.datalad.org/datalink/tools/issues/3 / https://github.com/psychoinformatics-de/datalad-concepts/issues/180.

A simple use case is that we (or specific users) would like to encode/capture the process of a study participant receiving a diagnosis. Based on current thinking, a diagnosis would be a Property of a study participant (Person -> Agent). Currently, a Property is subclassed from a Characteristic, which is not an Entity; and an Entity is the class to which provenance information can be added/related.

mih commented 2 weeks ago

I agree with the assessment.

So far, Characteristic's purpose was only "schema-glue". But as the example shows, this is insufficient.

I was briefly considering, whether the provenance addition could be limited to a value of a Characteristic, but I cannot see this work reasonably well.

For me, and right now, this would have complex implications.

That being said, all of the above would require a set of derived classes that equip Property and QuantitativeProperty with PROV aspects, and then adjust Entity, Agent, and Activity. However, this could also by done without any changes in the Thing schema, and just in PROV (likely creating a ThingWithProvenance and a CharacteristicWithProvenance base class).

The reason that I am hesitating making a Characteristic an Entity is that its purpose is to express a property of a thing in a generic fashion, when no dedicated proptery for something is predefined.... and ideally nothing else.

This leads me to one more solution (next post)...

mih commented 2 weeks ago

If we start thinking of a Characteristic as an Entity, then we can consider using the standard qualified-relation pattern for this, which we already make heavy use of.

What does this mean?

Expressing a generic proper wirks then like this:

I think this would be quite elegant. However, it comes with a challenge too: Any property/relation value now needs an identifier to be referencable within an Influence. For something like a diagnosis this seems straightfoward still. But this also applies to any QuantitativeValue.

For example, a Thing would (need to) have a relation with a specific Temperature (value and time taken) that requires a dedicated ID.

jsheunis commented 2 weeks ago

Thanks for the walk through of your thoughts. I agree with thinking of a Characteristic as an Entity, and realize the accompanying challenge (or annoyance) of having to assign identifiers to any arbitrary property. Once thing I don't grasp yet is why the slots relation and qualified_relation would need to move to Thing. Is it specifically to change the range of the latter to Influence (instead of EntityInfluence on Entity) so that the QuantitativeValue can be a Thing and doesn't need to be an Entity?