linkml / linkml-runtime

Runtime support for linkml generated models
https://linkml.io/linkml/
Creative Commons Zero v1.0 Universal
24 stars 23 forks source link

Modify `induced_class()` / `induced_slot()` so it materializes non-scalar metaslots as well #335

Closed sujaypatil96 closed 1 month ago

sujaypatil96 commented 1 month ago

Fixes https://github.com/linkml/linkml/issues/2224

There are 3 ways in which we could have written https://github.com/linkml/linkml-runtime/blob/798591b29661efa8fef05703c79e3614d40f911d/linkml_runtime/utils/schemaview.py#L1376

  1. if v2:
  2. if v2 is not None and ((isinstance(v2, (dict, list)) and v2) or (isinstance(v2, JsonObj) and as_dict(v2))):
  3. if not is_empty(v2):

Definition of is_empty() https://github.com/linkml/linkml-runtime/blob/798591b29661efa8fef05703c79e3614d40f911d/linkml_runtime/utils/formatutils.py#L112-L121

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 63.49%. Comparing base (37297a8) to head (ba677c6). Report is 14 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #335 +/- ## ========================================== + Coverage 63.44% 63.49% +0.04% ========================================== Files 63 63 Lines 8942 8943 +1 Branches 2569 2569 ========================================== + Hits 5673 5678 +5 + Misses 2648 2646 -2 + Partials 621 619 -2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sneakers-the-rat commented 1 month ago

well it looks like it made the upstream tests fail in an interesting way. will put my comments on the actual PR in a review but first...

the error is here, keyerror on domain_of: https://github.com/linkml/linkml/blob/c26d8995b164223d0dc074721fc30cf96e4965e0/tests/test_generators/test_docgen.py#L320

first, the test is wrong - the source schema does not have domain_of set for Person.slot_usage.species_name (nor would it really make sense to set domain_of in slot_usage): https://github.com/linkml/linkml/blob/c26d8995b164223d0dc074721fc30cf96e4965e0/tests/test_generators/input/kitchen_sink.yaml#L142-L143

what's sort of funny is that what must have been happening there is that the list for Person.slot_usage['species_name'].domain_of was bound to Person.species_name.domain_of since that's what the old behavior was, so then later when that list is populated then the slot_usage version of that was also populated since it's the same object.

slot gotten earlier in the method is shallow copied so that should break the object identity relation, but ancestor_class.slot.slot_usage is not, so it is vulnerable to mutation, hence why even with the old old behavior of multiple deepcopies in induced_slot that this behavior persisted in the test.

that would be a real big problem if nonscalar slots metaslots were ever mutated outside of the schemaview (unfortunately they are), since then any values set on the slot would have the wildly unpredictable behavior of propagating back up to the slot usage of the last ancestor to have it set, which would then propagate back down to any other slots that inherit from it.

this is additionally a problem because induced_slot is often called twice or three times for each slot despite caching, depending on the generator (for each time the signature differs), so we should expect this to lead to nondeterministic behavior that depends on the number of times a slot is generated and how its inheritance tree branches * how slot_usage is used in it.

so anyway, obvi this PR should only be merged along with a patch to the upstream tests, but i think we are probably in for a bit of a ride as we figure out what other unanticipated behavior this method has in store for us lol

sneakers-the-rat commented 1 month ago

Before we merge we should re-run upstream tests and patch that at the same time so we dont forget :)

(Should we just make upstream tests be on by default like the rest of the tests? Seems like in practice this way doesnt rly work bc have to keep manually re-running, even if it was a good idea in principle to save compute time.)

sujaypatil96 commented 1 month ago

I looked at the upstream linkml test that's causing the error in test_docgen.py, and I think it's okay to remove this assertion: https://github.com/linkml/linkml/blob/main/tests/test_generators/test_docgen.py#L320-L321

The contents of person_dict["slot_usage"] looks like below, there's no domain_of appearing in the materialization anymore:

{
    "name": {"name": "name", "pattern": "^\\S+ \\S+$"},
    "species name": {"equals_string": "human", "name": "species name"},
    "stomach count": {"equals_number": 1, "name": "stomach count"},
}
sneakers-the-rat commented 1 month ago

Actually yno what would be a good idea is writing a specific test for that not happening. I can follow up with that.

We should probably consider anything that is specifically touched within induced slot worthy of close testing, since almost more than any other method im aware of, if it is doing unexpected things the whole everything could break

sujaypatil96 commented 1 month ago

Actually yno what would be a good idea is writing a specific test for that not happening. I can follow up with that.

Thank you!! will you make a PR for this? i'll be happy to review it 😁 once that patch is ready to go, we can merge this PR in first, and then the patch PR on linkml.

sneakers-the-rat commented 1 month ago

Lets put that test over here, no? Since its a test of schemaview behavior?

cmungall commented 1 month ago

@sierra-moxon noticed that the example here: https://github.com/linkml/linkml/issues/2224#issuecomment-2249214252 has an override, can we change the test example so the range at the slot level uses Any as a range?

sujaypatil96 commented 1 month ago

Made the change requested here in this commit (ba677c6)