openMetadataInitiative / openMINDS_core

The openMINDS core metadata model includes schemas that can be used to describe the general origin, location and content of research products.
MIT License
20 stars 19 forks source link

Activity schema - changes needed in relation to openMINDS extensions #322

Closed UlrikeS91 closed 2 years ago

UlrikeS91 commented 2 years ago

Quick summary of the is-state before I get to the issue:

The Activity schema has the property "parameterSet" for stating a set of parameters related to the performed activity. In the core, the ProtocolExecution schema extends this schema. At the moment, ephys extension schemas extend the ProtocolExecution schema (e.g. PatchClampActivity schema) and computation extension schemas extend the Activity schema (specifically the Computation schema which is again extended for specific computations such as DataAnalysis).

Issues with this:

1) The ephys extension schemas should not extend ProtocolExecution because it introduces properties that are not used and should not be used there, such as BehaviouralProtocol. Solution: @apdavison, @maaikevS, @lzehl, @Peyman-N and I discussed this in an offline meeting and considered introducing an additional activity schema, e.g. called "experimentalActivity", that has the ProtocolExecution properties that apply to any experimental research activity, e.g. PreparationDesign. 2) Since "ParameterSet" is a property of the Activity schema, all extensions receive this property as well. However, the specialized extension schemas also define specific parameters for the activities that are important and true for the specific experimental step/approach (or similar). These specific parameters are basically like the "ParameterSet" but far less custom-defined. Personally, I find this confusing. Theoretically, all important parameters are already covered by the specific ones, making the "ParameterSet" redundant. It would still make sense to keep the property because it is hard to anticipate which other parameters researchers in the field may find relevant as well. Solution: Change the name "ParameterSet" to e.g. "AdditionalParameterSet". 3) This is an issue cause by the proposed solution from point 2. @apdavison uses the "ParameterSet" as the only parameters for the computation extensions. If I understood correctly, this is because the parameters can vary so greatly within a specific computing activity that it was impossible to define specific parameters for each of them. So, the term "AdditionalParameterSet" is semantically wrong in relation to the computation extension since they are not additional ones but the only ones. Meaning "ParameterSet" fits best in those cases. Solution: Below.

I would propose the following:

openMINDS_core:

Activity schema:

ExperimentalActivity schema (new):

ProtocolExecution schema:

openMINDS_computation: --> all activities extend the Computation schema which in turn extends the Activity schema from the core - this remains

Computation schema:

openMINDS_ephys: --> all activities extend the ProtocolExecution schema - this needs to be changed to the new ExperimentalActivity schema (then all the redundant properties will disappear, but you keep the ones that you wanted to keep)

All schemas:

Advantages/Disadvantages of this setup:

+ make it possible to NOT use "AdditionalParameterSet" which may already be useful for some extensions or for future extensions + semantically correct - every schema that does not have specific parameters directly defined can state "ParameterSet" (ProtocolExecution and Computation schemas) while the rest can have "AdditionalParameterSet"

- "(Additional)ParameterSet" needs to be added to each extension schema - "AdditionalParameterSet" could have been added to "ExperimentalActivity" if all extensions should get this anyway but then "ProtocolExecution" would say "AdditionalParameterSet" which is semantically not correct

@HumanBrainProject/openminds-developers any input/feedback is appreciated. This is just a first proposal. I'm happy to make changes to improve it!

apdavison commented 2 years ago

Thanks for this careful analysis Ulrike. I like the proposed solution, although since "protocol" becomes a field of "ExperimentalActivity", it seems strange that only "ProtocolExecution" has protocol in its name.

Since the current ProtocolExecution is supposed to be used only when we don't have a more specific extension, how about modifying your proposal as follows?

or simply

UlrikeS91 commented 2 years ago

Thanks for this careful analysis Ulrike. I like the proposed solution, although since "protocol" becomes a field of "ExperimentalActivity", it seems strange that only "ProtocolExecution" has protocol in its name.

Since the current ProtocolExecution is supposed to be used only when we don't have a more specific extension, how about modifying your proposal as follows?

  • "ExperimentalActivity" --> "ProtocolExecution"
  • "ProtocolExecution" --> "GenericProtocolExecution"

or simply

  • "ProtocolExecution" --> "GenericExperimentalActivity"

Thanks for the feedback!

If we think about how all these activity schemas are connected and which are on the same level, we get:

  1. Level: Activity
  2. Level: Computation - ExperimentalActivity
  3. Level: e.g. DataAnalysis - e.g. PatchClampActivity - but also: ProtocolExecution

@lzehl, is ProtocolExecution meant to be used for both experimental and computational executions? If yes, then @apdavison suggested name "GenericExperimentalActivity" would not be suitable.

Your other suggestions could work but I find "generic" vs. non-generic confusing. To me, both are pretty generic but you are probably right that "ProtocolExecution" is more generic than "ExperimentalActivity".

Since "Computation" and "ExperimentalActvity" are on the same level, would it make sense to name the schema "Experiment"? Then "ProtocolExecution" could be renamed to "GenericProtocolExecution".

apdavison commented 2 years ago

sorry I wasn't clear about "generic". My understanding is that "ExperimentalActivity", like "Activity", will not be used directly, but will be the basis of extensions. In that sense, (Generic)ProtocolExecution will be the generic extension, while PatchClampActivity and friends will be the specific extensions.

lzehl commented 2 years ago

Okay screening through this we can more quickly find a solution for the issue than we thought :wink:

My feedback / confirmation:

1) "parameter set" should remain in Activity with change name / instruction (see below)

2) EPHYS target schemas (BrainSlicingActivity, Craniotomy, CulturingActivity, ElectrodePlacementActivity, PatchClampActivity, StimulationExperiment) should not extend ProtocolExecution

*Naming convention for Activities (because I would suggest to avoid "Activity" as part of the names for the target schemas):

Issue with "parameter set" in Activity:

As stated above ALL target extensions currently define modality/technique specific properties. This is also true for the target schemas in COMPUTATION, because those are defined in the concept schema Computation. Meaning any other parameter given through the "parameter set" is additional to the schema defined properties. Although in an ideal case an modality/technique specific Activity extension would cover all relevant properties, we most likely will never reach that point to define modality/technique specific Activities for all use-cases. Therefore an additional parameter set is beneficial, in particular for edge cases.

To avoid mis-usage of "parameter set" for any extension and specifically the in-depth extensions by users we should refine the property name and the instructions.

Name options:

Instruction suggestions:

apdavison commented 2 years ago

@lyuba the problem with keeping ParameterSet in Activity, with a name like "additional parameter set", is that the Computation-based schemas will then have two ParameterSet fields, one called simply "parameter set" and the other "additional parameter set". I prefer @UlrikeS91's suggestion of removing the ParameterSet field from Activity, and then re-adding with the appropriate name where needed.

lzehl commented 2 years ago

@apdavison I don't see this "is that the Computation-based schemas will then have two ParameterSet fields, one called simply "parameter set" and the other "additional parameter set". " in your computation schemas.

All computation activities inherit from Computation which inherits from Activity. Computation specific properties are only provided in Computation and the actual target activities do not specify any other properties.

Each computation activity currently has therefore the following properties:

Renaming the "parameterSet" to "additionalParameterSet" or "user-definedParameterSet" is also here fitting.

apdavison commented 2 years ago

Renaming the "parameterSet" to "additionalParameterSet" or "user-definedParameterSet" is also here fitting.

No it is not fitting. This is exactly the problem I have been trying to express.

To me, Ulrike's proposed solution above solves the problem in a more elegant way. Do you have objections to it?

lzehl commented 2 years ago

@apdavison it is additional to all the properties which are parameters that you already stated in the schema, such as "environment", "launchConfiguration", "resourceUsage" etc.

I think the problem is how we define ParameterSet. In the way it was always meant it was an option for a user to specify any additional property that is not covered by the schema (the schema it links to can and should also only be used for this purpose).

If it helps rename the property to "additionalProperties" or "userDefinedProperties" and the schema to UserPropertyList (instead of ParameterSet).

I object to remove it because we actually want to have this option present on EACH Activity, because we most likely will never cover the in-depth needs of all use / edge cases. If we remove it we end up adding it to all extending schemas under inconsistent property names that all can only be used in the same context.

apdavison commented 2 years ago

I think the problem is how we define ParameterSet. In the way it was always meant it was an option for a user to specify any additional property that is not covered by the schema (the schema it links to can and should also only be used for this purpose).

@lyuba I agree, I think the issue is one of semantics/definitions: in computation we're using "parameters" with a different meaning. "environment", etc. are not parameters in that sense.

If it helps rename the property to "additionalProperties" or "userDefinedProperties" and the schema to UserPropertyList (instead of ParameterSet).

I think this is probably the best solution. StringParameter and NumericalParameter should also be renamed accordingly.

On further reflection, the current ParameterSet schema isn't even a very good fit for the computation schemas since it doesn't allow a hierarchical structure. Ideally the "parameters"/"parameterSet" property should allow either a File or some object we don't have yet ("Configuration"?) which would be like a small file (with content type limited to JSON, YAML, similar simple formats) but with the contents stored as text in the KG.

lzehl commented 2 years ago

@apdavison that is a good point. So I would suggest that we aiming to do the following:

Suggested changes for CustomPropertySet schema:

{
  "_type": "https://openminds.ebrains.eu/core/CustomPropertySet",
  "required": [
    "context",
    "relevantFor",
    "definedIn"
  ],
  "properties": {
    "context": {
      "type": "string",
      "_instruction": "Enter the common context for this custom property set."
    },
    "relevantFor": {
      "_instruction": "Add the technique for which this custom property set is relevant.",
      "_linkedTypes": [
        "https://openminds.ebrains.eu/controlledTerms/Technique"
      ]
    },
    "definedIn": {
      "_instruction": "Add a file or configuration that specifies the custom property set of the given context.",
      "_linkedTypes": [
        "https://openminds.ebrains.eu/core/File",
        "https://openminds.ebrains.eu/core/Configuration",
        "https://openminds.ebrains.eu/core/PropertyValueList"
      ]
    }
  }
}

Suggested changes for Configuration schema:

{
  "_type": "https://openminds.ebrains.eu/core/Configuration",
  "required": [
    "relevantFor",
    "definedIn"
  ],
  "properties": {
    "lookupLabel": {
      "type": "string",
      "_instruction": "Enter a lookup label for this configuration that may help you to more easily find it again."
    },
    "configuration": {
      "type": "string",
      "_instruction": "Enter the configuration in a simple text based format (e.g., JSON, or YAML)."
    },
    "definitionFormat": {
      "_instruction": "Add the content type of the defined configuration.",
      "_linkedTypes": [
        "https://openminds.ebrains.eu/core/ContentType"
      ]
    }
  }
}

Suggested changes for PropertyValueList schema:

{
  "_type": "https://openminds.ebrains.eu/core/PropertyValueList",
  "required": [
    "relevantFor",
    "definedIn"
  ],
  "properties": {
    "lookupLabel": {
      "type": "string",
      "_instruction": "Enter a lookup label for this key-value list that may help you to more easily find it again."
    },
    "propertyValuePair": {
      "type": "array",
      "minItems": 1,
      "uniqueItems": true,
      "_instruction": "Add all numerical and string property-value pairs that belong to this list.",
      "_embeddedTypes": [
        "https://openminds.ebrains.eu/core/NumericalProperty",
        "https://openminds.ebrains.eu/core/StringProperty"
      ]
    }
  }
}

looking forward to your feedback

UlrikeS91 commented 2 years ago

@apdavison that is a good point. So I would suggest that we aiming to do the following:

  • rename the Activity property "parameter set" to "custom property set"
  • rename the schema ParameterSet to CustomPropertySet
  • rename the schema NumericalParameter to NumericalProperty
  • rename the schema StringParameter to StringProperty
  • extending / revising the CustomPropertySet schema (see also below)

Suggested changes for CustomPropertySet schema:

{
  "_type": "https://openminds.ebrains.eu/core/CustomPropertySet",
  "required": [
    "context",
    "relevantFor",
    "definedIn"
  ],
  "properties": {
    "context": {
      "type": "string",
      "_instruction": "Enter the common context for this custom property set."
    },
    "relevantFor": {
      "_instruction": "Add the technique for which this custom property set is relevant.",
      "_linkedTypes": [
        "https://openminds.ebrains.eu/controlledTerms/Technique"
      ]
    },
    "definedIn": {
      "_instruction": "Add a file or configuration that specifies the custom property set of the given context.",
      "_linkedTypes": [
        "https://openminds.ebrains.eu/core/File",
        "https://openminds.ebrains.eu/core/Configuration",
        "https://openminds.ebrains.eu/core/PropertyValueList"
      ]
    }
  }
}

Suggested changes for Configuration schema:

{
  "_type": "https://openminds.ebrains.eu/core/Configuration",
  "required": [
    "relevantFor",
    "definedIn"
  ],
  "properties": {
    "lookupLabel": {
      "type": "string",
      "_instruction": "Enter a lookup label for this configuration that may help you to more easily find it again."
    },
    "configuration": {
      "type": "string",
      "_instruction": "Enter the configuration in a simple text based format (e.g., JSON, or YAML)."
    },
    "definitionFormat": {
      "_instruction": "Add the content type of the defined configuration.",
      "_linkedTypes": [
        "https://openminds.ebrains.eu/core/ContentType"
      ]
    }
  }
}

Suggested changes for PropertyValueList schema:

{
  "_type": "https://openminds.ebrains.eu/core/PropertyValueList",
  "required": [
    "relevantFor",
    "definedIn"
  ],
  "properties": {
    "lookupLabel": {
      "type": "string",
      "_instruction": "Enter a lookup label for this key-value list that may help you to more easily find it again."
    },
    "propertyValuePair": {
      "type": "array",
      "minItems": 1,
      "uniqueItems": true,
      "_instruction": "Add all numerical and string property-value pairs that belong to this list.",
      "_embeddedTypes": [
        "https://openminds.ebrains.eu/core/NumericalProperty",
        "https://openminds.ebrains.eu/core/StringProperty"
      ]
    }
  }
}

looking forward to your feedback

Except for some copy-paste errors in your proposed schemas 😆, I like the proposed setup and specific suggestions. "Custom property set" is a good compromise and my personal problem with the "parameter set" would be solve by this 😉

lzehl commented 2 years ago

@apdavison should the Configuration also be able to state the Software Version it is meant for? (optional)

apdavison commented 2 years ago

@lzehl I think that's the role of the ContentType.

(also note that ideally the Configuration would be embedded, but I don't think you can have both _linkedTypes and _embeddedTypes in the same property)

lzehl commented 2 years ago

@apdavison top :+1: then we'll have a good solution I think. Let's keep Configuration now in as "linkedType".

I forwarded the issue to combine linked and embedded types under one property to Oli, since it is not the first time this functionality comes up (@olinux). Schema wise I think that is possible, interpreting it for editors etc might be difficult. We can tackle this at a later point in time again.

lzehl commented 2 years ago

@HumanBrainProject/openminds-developers any volunteers who want to implement this?

Peyman-N commented 2 years ago

@lzehl I would update #325 to reflect these changes.

lzehl commented 2 years ago

@Peyman-N you referenced the wrong PR. You mean #323 no?

Peyman-N commented 2 years ago

@lzehl I have implemented all problems raised in your comment. (By the way thanks a lot for gathering all the needed changes in the same place, it was really a big help) The only thing is independence of ProtocolExecution from ExperimentalActivity raised in this comment . I think having this connection through inheritance is the logical way of doing this but I completely agree that the decoupling of them would give us more flexibility. What do you think @lzehl , @apdavison and @UlrikeS91?

lzehl commented 2 years ago

@Peyman-N happy to help :)

To the remaining issue about the independence of ProtocolExecution from ExperimentalActivity:

The protocol execution is a general activity that does not have to be related to an activity for experimental data. It could also be used for simulated data or computation descriptions when users do not want to go to in-depth metadata. I would keep it separate so that we can change it independently (e.g. making it more general) from the experimental activity (and its extending schemas).

UlrikeS91 commented 2 years ago

@Peyman-N happy to help :)

To the remaining issue about the independence of ProtocolExecution from ExperimentalActivity:

The protocol execution is a general activity that does not have to be related to an activity for experimental data. It could also be used for simulated data or computation descriptions when users do not want to go to in-depth metadata. I would keep it separate so that we can change it independently (e.g. making it more general) from the experimental activity (and its extending schemas).

Agreed :)