openMetadataInitiative / openMINDS_core

The openMINDS core metadata model includes schemas that can be used to describe the general origin, location and content of research products.
MIT License
20 stars 19 forks source link

request for open value ranges #330

Open lzehl opened 2 years ago

lzehl commented 2 years ago

@HumanBrainProject/openminds-developers we frequently get the request to allow for open ranges specifically for age information. E.g. indication that all participants were at least 18 years old or older.

I see three options how we can solve this within the schemas:

1) make both minimum value and maximum value in the QuantitativeValueRange schema optional (less preferred option from my side) 2) introduce a new schema QuantitativeValueRangeOpen that has the properties "border value" (e.g., 18), "relational operator" (e.g., "≥"; should be controlledTerm), and "unit" (e.g., "year"; is controlledTerm) (seems to me the fastest clean solution for this issue) 3) extend the openMINDS syntax for schemas to define ONE-OF required properties and change the QuantitativeValueRange schema accordingly (this would give us the most flexibility also for other cases with the same issue, but technically it's not so easy to solve as far as I can see, because the one-of option in json-schema seems not to be build for required properties)

What do you think? (comments from outside of the openMINDS development team are of course also welcome)

UlrikeS91 commented 2 years ago

3.) would be best, but I understand the issue. If we need it asap, then I would go for 2.) and not for 1.)

jcolomb commented 2 years ago

what about having a value set for this purpose in the QuantitativeValueRange schema, so you can set minimum value 18, maximum value of "undefined" or "right-censored-data" or some other term you would find appropriate?

It might make sense for other kind of range where the maximum is only the maximum we are looking at (example max time to find a platform in a water maze experiment)

Peyman-N commented 2 years ago

I really prefer option 3 too. But I fail to see why if we do it with ONE-OFF would cause any problem. For example this should work sufficiently enough:

"maxValue": {
    "anyOf":[
        {
            "type": "number"
        },
        {
            "const" : "Infinity",
        },
        "_instruction": "Add the maximum value measured for this range, if this is an open range specify 'Infinity'."
    }

Dose it cause a problem with generator ?

lzehl commented 2 years ago

@Peyman-N & @jcolomb thanks for the suggestion but the problem here is that you have to defined it for both ranges which could then lead to the following: (infinity to infinity), (undefined to undefined), (undefined to infinity).

With the "one-off" option, I meant a schema definition where you have to state either a minValue, or a maxValue or both which I think is not possible at the moment.

lzehl commented 2 years ago

And: string OR number will cause issues with editors (at least at the moment with the KG Editor)

Peyman-N commented 2 years ago

We can move the "oneOf" to a higher level. Meaning having only three options. But for "string OR number", it depends on the KG language, in python a float value can be 'inf'. positive_infinity = float('inf')

lzehl commented 2 years ago

@Peyman-N I don't now what you mean. You cannot define a oneOf (etc) for properties. Only for values. And we would need it for properties, specifically for stating the requirement of properties.

Peyman-N commented 2 years ago

I meant something like this:

{
    "_type": "https://openminds.ebrains.eu/core/QuantitativeValueRange",
    "required": [
        "maxValue",
        "minValue"
    ],
    "properties": {
        "oneof": [
            {
                "maxValue": {
                    "type": "number",
                    "_instruction": "Add the maximum value measured for this range, if this is an open range specify 'inf'."
                },
                "minValue": {
                    "type": "number",
                    "_instruction": "Add the minimum value measured for this range,if this is an open range specify '-inf'."
                },
                "unit": {
                    "_instruction": "Add the unit of measurement of this quantitative value range.",
                    "_linkedTypes": [
                        "https://openminds.ebrains.eu/controlledTerms/UnitOfMeasurement"
                    ]
                }
            },
            {
                "maxValue": {
                    "const": "inf",
                    "_instruction": "Add the maximum value measured for this range, if this is an open range specify 'inf'."
                },
                "minValue": {
                    "type": "number",
                    "_instruction": "Add the minimum value measured for this range,if this is an open range specify '-inf'."
                },
                "unit": {
                    "_instruction": "Add the unit of measurement of this quantitative value range.",
                    "_linkedTypes": [
                        "https://openminds.ebrains.eu/controlledTerms/UnitOfMeasurement"
                    ]
                }
            },
            {
                "maxValue": {
                    "type": "number",
                    "_instruction": "Add the maximum value measured for this range, if this is an open range specify 'inf'."
                },
                "minValue": {
                    "const": "-inf",
                    "_instruction": "Add the minimum value measured for this range,if this is an open range specify '-inf'."
                },
                "unit": {
                    "_instruction": "Add the unit of measurement of this quantitative value range.",
                    "_linkedTypes": [
                        "https://openminds.ebrains.eu/controlledTerms/UnitOfMeasurement"
                    ]
                }
            }
        ]
    }
}
lzehl commented 2 years ago

@Peyman-N the "required" does not fit with this schema and "anyOf" would be an actual property name not a technical option. Each "anyOf" option has to be value (which could be a number, string etc or an object). But the way you stated it in the example is incorrect I think.

lzehl commented 2 years ago

Please prove me wrong though :sweat_smile: it would be great if it would be an easy implementation but from what I looked at it doesn't work (at the moment).

Peyman-N commented 2 years ago

hahahaha, I think it works as intended, I checked the following schema with a JSON Schema Validator and it works as intended.

{
    "oneof": [
        {
            "maxValue": {
                "type": "number"
            },
            "minValue": {
                "type": "number"
            },
            "unit": {
                "type": "number"
            }
        },
        {
            "maxValue": {
                "const": "inf"
            },
            "minValue": {
                "type": "number"
            },
            "unit": {
                "type": "number"
            }
        },
        {
            "maxValue": {
                "type": "number"
            },
            "minValue": {
                "const": "-inf"
            },
            "unit": {
                "type": "number"
            }
        }
    ],
    "required": [
        "maxValue",
        "minValue"
    ]
}

for the following instance pass the test:

{
"maxValue": 25,
"minValue":69,
"unit": 78
}

and following instance too:

{
"maxValue": "inf",
"minValue":69
}

but following instance don't meet the requirements

{
"maxValue": 49,
"unit": 78
}

Beside this, I cheeked the documentation it states the following:

This keyword's value MUST be a non-empty array. Each item of the array MUST be a valid JSON Schema.

So logically this should work.

jcolomb commented 2 years ago

It might be worth looking at different use cases and why the requirement. It relates to the fact that "infinite" and "undefined" are different values, (as are minimum value by design and minimal values by chance).

It seems people want to indicate that the study was designed with participants 18 and above, which is something different than indicating the range of ages (more or less by chance) the study had (in that latter case, they would be able to indicate the upper range too.)

If we take data from a water maze example, the range of the data "time to find the platform" is 0-300s (usually), with NA indicating the platform was never found. People would then indicate that range (0-300) even for dataset where 300 (and 0) is actually never reached.

I think there are different problems, which might require different solution ?

  1. give information about the experimental design: for instance the data is left-censored. Here I think a unique type should include the different use cases, maybe adding a property about the type of limits (including/excluding the limit, open on one side,...) and having the value as an array of 2 elements. One may want to include all types of interval in that type (https://en.wikipedia.org/wiki/Interval_(mathematics))
  2. Allow users to give incomplete information about the range.
lzehl commented 2 years ago

@Peyman-N sorry yes you are correct but in the end this is exactly what I said: the oneOf (etc) option only works on whole objects or simple values. You are defining three different objects/schemas within one object in your case (and if we do that we can also go with option 2) not defining at least specify one of the required properties. You're suggestion will be also highly difficult to interpret for editors such as the KG Editor (I just spoke with Oli).

What I meant with option 3 was something like this (which is currently not supported):

{
  "_type": "https://openminds.ebrains.eu/core/QuantitativeValueRange",
  "required": {
    "oneOf": [
      ["maxValue", "minValue"],
      ["maxValue"],
      ["minValue"]
  ],
  "properties": {
    "maxValue": {
      "type": "number",
      "_instruction": "Add the maximum value measured for this range."
    },
    "minValue": {
      "type": "number",
      "_instruction": "Add the minimum value measured for this range."
    },
    "unit": {
      "_instruction": "Add the unit of measurement of this quantitative value range.",
      "_linkedTypes": [
        "https://openminds.ebrains.eu/controlledTerms/UnitOfMeasurement"
      ]
    }
  }
}

@jcolomb I completely agree that "undefined" and "infinity" are different use cases. why an option range is need we cannot say beforehand, this is use case dependent. from the openMINDS perspective I would like to leave the why for the user to define in "additional remarks". I don't necessarily would like to cover all cases structurally.

Taking @jcolomb point into account option 2 might actually be the best solution, because we could add a property "remark" (optional) for cases where the reason for the open range needs further explanations.

Peyman-N commented 2 years ago

I see, so let's go with option 2.

jcolomb commented 2 years ago

For the name, I would use unbounded instead of open, as "open" is used to say the limit is included in the interval : https://en.wikipedia.org/wiki/Interval_(mathematics)#Classification_of_intervals

lzehl commented 2 years ago

@jcolomb yes that is a good point. QuantitativeValueRangeUnbounded is very long... but that is unavoidable in that case.