Closed sneakers-the-rat closed 6 months ago
Yes, it doesn't quite work for the reasons you state. We'd have to make the range of array
something like a union of an array object and a boolean.
This isn't the first time we've run into things like this. From a pythonic POV it's frustrating not to be able to distinguish Nones, unset objects, or empty objects. One way to think about it is making the metamodel work for the broadest range of target frameworks - e.g. we want to allow people to store schemas in a relational database.
What strategies have worked in the past for this? i'll follow your lead, maybe i should have raised this as an issue rather than a PR...
In this particular case I don’t think it’s so bad for the schema author to provide a slot assignment that forces an array nature such as minimum cardinality: 1. This is perhaps odd, since zeroD arrays are prohibited it would be semantically vapid. But I think this is more explicitl that say a comment or other piece of metadata
On Tue, Mar 5, 2024 at 12:46 PM Jonny Saunders @.***> wrote:
What strategies have worked in the past for this? i'll follow your lead, maybe i should have raised this as an issue rather than a PR...
— Reply to this email directly, view it on GitHub https://github.com/linkml/linkml-model/pull/189#issuecomment-1979607834, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOLEG6BL5WB5HA6XR2TYWYVJFAVCNFSM6AAAAABEGOKVSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGYYDOOBTGQ . You are receiving this because you were mentioned.Message ID: @.***>
Actually wait this works perfectly: https://github.com/sneakers-the-rat/linkml-runtime/commit/26d2081afa9e7de41fc196f2549e7b5dd4afcf29
what we need isn't a change in the metamodel per se, but greater access to the literal representation of the underlying schema. we already have this place in the metamodel where we have a difference in meaning between None
and being unset, this just lets us tell whether that's the case.
I wrote that into the Element
class there, but what we'll want to do is add one more class in between Element
and YAMLRoot
that is also a dataclass, but not generated by the python generator, and all it does is that __new__
method.
So this:
classes:
TypedUnset:
attributes:
array:
range: integer
array:
NonArrayUnset:
attributes:
nonarray:
range: integer
works like this
sv = SchemaView('schema.yaml')
nonarray = nonarray = sv.get_class('NonArrayUnset').attributes['nonarray']
isarray = sv.get_class('TypedUnset').attributes['array']
>>> nonarray.array is None
True
>>> isarray.array is None
True
>>> sv.is_unset(nonarray, 'array')
True
>>> sv.is_unset(isarray, 'array')
False
ezpz
the special casing i think comes from the fact that array
isn't a TypeExpression
, like we could also have gone with
MyClass:
attributes:
my_array:
range:
array:
range: integer
which is a little more awkward but doesn't involve parsing None from unset. it's not too late to do that, and it might be the right call, but i don't have strong feelings there. In that case i guess we'd have to special case that array: {range} can't itself be an array, but that's a more hidden special case
In my opinion,
classes:
MyClass:
attributes:
my_array:
array: {}
and
classes:
MyClass:
attributes:
my_array:
array:
minimum_number_dimensions: 1
are a bit ugly but not so bad.
I think it would not be so common that someone would want to define an any shaped array with no information about it. The one use case I have seen in NWB is a "scratch" dataset where a user can store whatever array they want, and it is up to the user to keep track of what it represents.
But if we can differentiate between None and unset in code using @sneakers-the-rat 's example above, even better!
I think of the metamodel and the resulting schemas as an interface - they are the public contract that the standard makes to the rest of the generators, transformers, etc. So to me the syntax should be designed with authorship in mind, and to make clear expectations of usage and behavior, and then we should do what we can behind the scenes to make that work, so I think the ability to differentiate between "set as None" and "unset" to be able to give a predictable, coherent interface is a challenge worth taking :).
The above fix is actually "truer" to the underlying YAML representation - things being interpreted as None
when they are not explicitly set is a feature of the python dataclasses (that i think is correct), but the YAML representation would be able to differentiate between unset and set as None. So this is basically just restoring a missing part of the python dataclass representation in a backwards compatible way.
it turns out this change is a great deal more difficult bc the metaclasses are already assumed to be 1:1 with the JSON/YAML for the purposes of dumping/loading by the dumpers, and the jsonasobj2 package is pretty mysterious with how to set instance-private attributes, so seems like we might need to pay some tech debt on this to implement first.
edit: i got a version that works within linkml_runtime but on testing with the rest of linkml it looks like there are hundreds of places where eg. different generators iterate over vars(metamodel_class)
, ie. treating the metamodel classes as dictionaries rather than objects that can have private/etc. attributes. Sort of at a loss on this because a lot of it has to do with the use of jsonasobj2 which i still dont' entirely understand the purpose of. it seems like the separation of concerns between 'object' and 'serialization' has sort of been lost there which makes it pretty brittle to modify, so i'm just going to chalk this one up as a loss pending further information bc i don't have time to get to the bottom of it
From @cmungall : some representations, e.g., JSON-LD, cannot distinguish between unset and set to None/null, so it is not ideal to assign different meanings to those.
@linkml/ndarray-wg decided that array: {}
or array: any non-None value
represents an any-shaped, any-typed array. We can decide later whether we want to distinguish between unset and set to None/null for the purpose of testing for monotonicity.
Problem:
the way to specify an
Any
shaped array is to do this:which rocks. The problem is that is equivalent to this from the POV of the python dataclasses:
so to make an any shaped array one has to, in practice, do something that is
False
-y but notNone
like:Similarly to the semantic difference between
None
andFalse
inmax_number_dimensions
, this PR makes the default value for thearray
slotFalse
, making it distinguishable toNone
and preserving semantics.The alternative of keeping the default
None
and making any shaped arrays beFalse
is desirable from a metamodel consistency POV, but counterintuitive - "what do you meanarray: False
means the most general array definition possible." Arguably, this is still consistent with metamodel intuition, because the default - as in "when one doesn't specify an array at all" - still means "No array," and specifyingarray: None
is an affirmative signal that we want anarray
withNone
restrictions on its shape.edit: actually now that i'm thinking about this, this also probably wouldn't work, so let's just say this issue is the "we need to figure out how to differentiate between
array: None
andarray
not being present on a class definition" PRcc @cmungall @rly