mapping-commons / sssom-py

Python toolkit for SSSOM mapping format
https://mapping-commons.github.io/sssom-py/index.html#
MIT License
49 stars 12 forks source link

`EntityReference` typedef: docs & validation inconsistent #512

Closed joeflack4 closed 6 months ago

joeflack4 commented 6 months ago

Overview

I'm running sssom validate and was getting an error that my creator_id was invalid. The docs for creator_id show that it is of type EntityReference, typeof: uriorcurie.

However, when I have a URI, creator_id: https://orcid.org/0000-0002-2906-7319, validation fails with:

  File "/Users/joeflack4/virtualenvs/icd11/lib/python3.10/site-packages/sssom/util.py", line 1319, in get_all_prefixes
    raise ValidationError(
jsonschema.exceptions.ValidationError: Slot 'creator_id' has an incorrect value: ['https://orcid.org/0000-0002-2906-7319']

If I change it to a CURIE, validation passes.

Suggested solution

a. Update the docs for EntityReference: typeof: curie. b. Update get_all_prefixes() so that it ignores metadata slots where the typedef is not simply 'curie'. Or, if it's UriOrCurie, check to make sure at least one is valid.

Additional info

Example files

FYI: docstring issue

The docstring for get_all_prefixes() shows :raises ValidationError: If slot is wrong. twice.

matentzn commented 6 months ago

This is a confusing problem of linkml for me.. The type is really EntityReference, not uri_or_curie in the sense of "use one or the other". The correct way to express it would be: If its SSSOM TSV, or SSSOM JSON, it MUST be curie. If it is SSSOM RDF, it must be uri.

I guess this will have to be documented much much better.

joeflack4 commented 6 months ago

Ah, that is not what I expected. That kind of conditional type def is tricky to express, but it also comes off like a code smell. Perhaps we could make EntityReference an abstract class and then subclass it, using different sub-types for JSON / TSV or RDF serialization, but IDK how that'd end up looking in the docs either.

Maybe can just update the description on it in the documentation to say exactly what you said:

If its SSSOM TSV, or SSSOM JSON, it MUST be curie. If it is SSSOM RDF, it must be uri.

matentzn commented 6 months ago

I made a PR about this: https://github.com/mapping-commons/sssom/pull/358

We cant do anything else right now, so I will close this issue here, but I expect the PR to raise some eyebrows and subsequent dialogues.