linkml / linkml-model

Link Modeling Language (LinkML) model
https://linkml.github.io/linkml-model/docs/
34 stars 16 forks source link

Make `array` slot default false #189

Closed sneakers-the-rat closed 6 months ago

sneakers-the-rat commented 7 months ago

Problem:

the way to specify an Any shaped array is to do this:

classes:
  MyClass:
    attributes:
      my_array:
        array:

which rocks. The problem is that is equivalent to this from the POV of the python dataclasses:

classes:
  MyClass:
    attributes:
      my_array:

so to make an any shaped array one has to, in practice, do something that is False-y but not None like:

classes:
  MyClass:
    attributes:
      my_array:
        array: {}

Similarly to the semantic difference between None and False in max_number_dimensions, this PR makes the default value for the array slot False, making it distinguishable to None and preserving semantics.

The alternative of keeping the default None and making any shaped arrays be False is desirable from a metamodel consistency POV, but counterintuitive - "what do you mean array: False means the most general array definition possible." Arguably, this is still consistent with metamodel intuition, because the default - as in "when one doesn't specify an array at all" - still means "No array," and specifying array: None is an affirmative signal that we want an array with None restrictions on its shape.

edit: actually now that i'm thinking about this, this also probably wouldn't work, so let's just say this issue is the "we need to figure out how to differentiate between array: None and array not being present on a class definition" PR

cc @cmungall @rly

cmungall commented 7 months ago

Yes, it doesn't quite work for the reasons you state. We'd have to make the range of array something like a union of an array object and a boolean.

This isn't the first time we've run into things like this. From a pythonic POV it's frustrating not to be able to distinguish Nones, unset objects, or empty objects. One way to think about it is making the metamodel work for the broadest range of target frameworks - e.g. we want to allow people to store schemas in a relational database.

sneakers-the-rat commented 7 months ago

What strategies have worked in the past for this? i'll follow your lead, maybe i should have raised this as an issue rather than a PR...

cmungall commented 7 months ago

In this particular case I don’t think it’s so bad for the schema author to provide a slot assignment that forces an array nature such as minimum cardinality: 1. This is perhaps odd, since zeroD arrays are prohibited it would be semantically vapid. But I think this is more explicitl that say a comment or other piece of metadata

On Tue, Mar 5, 2024 at 12:46 PM Jonny Saunders @.***> wrote:

What strategies have worked in the past for this? i'll follow your lead, maybe i should have raised this as an issue rather than a PR...

— Reply to this email directly, view it on GitHub https://github.com/linkml/linkml-model/pull/189#issuecomment-1979607834, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOLEG6BL5WB5HA6XR2TYWYVJFAVCNFSM6AAAAABEGOKVSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGYYDOOBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

sneakers-the-rat commented 7 months ago

Actually wait this works perfectly: https://github.com/sneakers-the-rat/linkml-runtime/commit/26d2081afa9e7de41fc196f2549e7b5dd4afcf29

what we need isn't a change in the metamodel per se, but greater access to the literal representation of the underlying schema. we already have this place in the metamodel where we have a difference in meaning between None and being unset, this just lets us tell whether that's the case.

I wrote that into the Element class there, but what we'll want to do is add one more class in between Element and YAMLRoot that is also a dataclass, but not generated by the python generator, and all it does is that __new__ method.

So this:

classes:
  TypedUnset:
    attributes:
      array:
        range: integer
        array:

  NonArrayUnset:
    attributes:
      nonarray:
        range: integer

works like this

sv = SchemaView('schema.yaml')
nonarray = nonarray = sv.get_class('NonArrayUnset').attributes['nonarray']
isarray = sv.get_class('TypedUnset').attributes['array']
>>> nonarray.array is None
True
>>> isarray.array is None
True
>>> sv.is_unset(nonarray, 'array')
True
>>> sv.is_unset(isarray, 'array')
False

ezpz

sneakers-the-rat commented 7 months ago

the special casing i think comes from the fact that array isn't a TypeExpression, like we could also have gone with

MyClass:
  attributes:
    my_array:
      range:
        array:
          range: integer

which is a little more awkward but doesn't involve parsing None from unset. it's not too late to do that, and it might be the right call, but i don't have strong feelings there. In that case i guess we'd have to special case that array: {range} can't itself be an array, but that's a more hidden special case

rly commented 6 months ago

In my opinion,

classes:
  MyClass:
    attributes:
      my_array:
        array: {}

and

classes:
  MyClass:
    attributes:
      my_array:
        array:
          minimum_number_dimensions: 1

are a bit ugly but not so bad.

I think it would not be so common that someone would want to define an any shaped array with no information about it. The one use case I have seen in NWB is a "scratch" dataset where a user can store whatever array they want, and it is up to the user to keep track of what it represents.

But if we can differentiate between None and unset in code using @sneakers-the-rat 's example above, even better!

sneakers-the-rat commented 6 months ago

I think of the metamodel and the resulting schemas as an interface - they are the public contract that the standard makes to the rest of the generators, transformers, etc. So to me the syntax should be designed with authorship in mind, and to make clear expectations of usage and behavior, and then we should do what we can behind the scenes to make that work, so I think the ability to differentiate between "set as None" and "unset" to be able to give a predictable, coherent interface is a challenge worth taking :).

The above fix is actually "truer" to the underlying YAML representation - things being interpreted as None when they are not explicitly set is a feature of the python dataclasses (that i think is correct), but the YAML representation would be able to differentiate between unset and set as None. So this is basically just restoring a missing part of the python dataclass representation in a backwards compatible way.

sneakers-the-rat commented 6 months ago

it turns out this change is a great deal more difficult bc the metaclasses are already assumed to be 1:1 with the JSON/YAML for the purposes of dumping/loading by the dumpers, and the jsonasobj2 package is pretty mysterious with how to set instance-private attributes, so seems like we might need to pay some tech debt on this to implement first.

edit: i got a version that works within linkml_runtime but on testing with the rest of linkml it looks like there are hundreds of places where eg. different generators iterate over vars(metamodel_class), ie. treating the metamodel classes as dictionaries rather than objects that can have private/etc. attributes. Sort of at a loss on this because a lot of it has to do with the use of jsonasobj2 which i still dont' entirely understand the purpose of. it seems like the separation of concerns between 'object' and 'serialization' has sort of been lost there which makes it pretty brittle to modify, so i'm just going to chalk this one up as a loss pending further information bc i don't have time to get to the bottom of it

rly commented 6 months ago

From @cmungall : some representations, e.g., JSON-LD, cannot distinguish between unset and set to None/null, so it is not ideal to assign different meanings to those.

@linkml/ndarray-wg decided that array: {} or array: any non-None value represents an any-shaped, any-typed array. We can decide later whether we want to distinguish between unset and set to None/null for the purpose of testing for monotonicity.

cmungall commented 6 months ago

See also https://github.com/orgs/linkml/discussions/1975