Open mih opened 10 months ago
FWIW example doesn't validate given provided schema
❯ linkml-validate -s MonoDataladDatasetVersion-schema.yaml --target-class MonoDataladDatasetVersion MonoDataladDatasetVersion-example.yaml
[ERROR] [MonoDataladDatasetVersion-example.yaml/0] {'gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58': {'meta_type': 'dlco:AnnexedFile', 'gitsha': 'b94ef1797f7bfc1ac979be122e1b538bbb0d1d58', 'distribution': {'qualified_access': {'access_id': 'MD5E-s3425--32a617360d10e3dcbfdd0885e8d64ab8.txt', 'relation': 'annex:7e0bf3e7-7d46-4093-813e-b4009826c3bf'}}}} is not of type 'array' in /has_part
and requires fix to schema (or change to example I guess):
diff --git a/MonoDataladDatasetVersion-schema.yaml b/MonoDataladDatasetVersion-schema.yaml
index 8b85b1e..7357ded 100644
--- a/MonoDataladDatasetVersion-schema.yaml
+++ b/MonoDataladDatasetVersion-schema.yaml
@@ -127,7 +127,7 @@ classes:
range: FileInGit
multivalued: true
inlined: true
- inlined_as_list: true
+ inlined_as_list: false
qualified_part:
range: QualifiedGitTrackedPart
multivalued: true
and FWIW, commenting out that relation:
doesn't resolve the situation for me -- just leads to another crash
File "/home/yoh/proj/misc/linkml/trash/gh-1812/venvs/dev/lib/python3.11/site-packages/linkml_runtime/loaders/yaml_loader.py", line 41, in load_any
return self._construct_target_class(data_as_dict, target_class)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/proj/misc/linkml/trash/gh-1812/venvs/dev/lib/python3.11/site-packages/linkml_runtime/loaders/loader_root.py", line 132, in _construct_target_class
return target_class(**data_as_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 8, in __init__
File "test", line 254, in __post_init__
File "test", line 192, in __post_init__
File "test", line 141, in __post_init__
File "test", line 141, in <listcomp>
File "<string>", line 6, in __init__
File "test", line 149, in __post_init__
File "/home/yoh/proj/misc/linkml/trash/gh-1812/venvs/dev/lib/python3.11/site-packages/linkml_runtime/utils/yamlutils.py", line 48, in __post_init__
raise ValueError('\n'.join(messages))
ValueError: Unknown argument: gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58 = AnnexedFile(has_part=[], meta_type='dlco
does it reproduce for you @mih solely from the information above?
sorry for the noise -- learning etc as I go. Apparently there is some aspect I still do not quite grasp here since the solution I suggested above and changed from true
to false
was wrong -- I had to comment out that line entirely but add inlined_as_list: true
at the Resource
level thus overloading inlining etc... the better solution to make it reproducible was just to make original example to use list, not dict, for has_part
in the example, so to become
has_annex_remote:
annex:7e0bf3e7-7d46-4093-813e-b4009826c3bf:
uuid: 7e0bf3e7-7d46-4093-813e-b4009826c3bf
has_part:
- meta_id: gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58
meta_type: dlco:AnnexedFile
gitsha: b94ef1797f7bfc1ac979be122e1b538bbb0d1d58
distribution:
qualified_access:
access_id: MD5E-s3425--32a617360d10e3dcbfdd0885e8d64ab8.txt
relation: annex:7e0bf3e7-7d46-4093-813e-b4009826c3bf
qualified_part:
- at_location: README.txt
# comment out the following line to get a working conversion
relation: gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58
and then the issue "fully" reproduces.
Disclaimer: I am very new to linkml, and worked myself through docs and examples over the past weeks.
I am trying to compose a schema for describing datasets that are tracked with Git/git-annex. For compatibility with broader infrastructures, the schema is based on the DCAT v3 model and has corresponding classes. The DCAT concepts do not require globally unique identifiers (in the linkml
identifier:true
sense). However, in the Git world, everything tracked does have such an identifier. Consequently, the schema wants to declare such property and also use it in data instances.Specifically, two variants of the "qualified relation" pattern are needed. One (
QualifiedAccess
) to add anaccess_id
for retrieving a resource, and another (QualifiedPart
) for declaring a location of a dataset part within a dataset.I apologize for the complexity of the linkml schema below, but it is the smallest extract of the actual schema that I could come up with that still shows my problem in both ways. For readability I removed all descriptions and mappings.
When I convert the example below to RDF, I get:
When the last line in the example is commented out (
relation: gitsha:...
) the conversion works, but is then missing the critical link.The "same" pattern (link by ID) works for the
qualified_access
specification. However, in order to make it work, I had to declare the respectiveidentifier:true
slot in the base class. This is problematic, because it is therebyrequired:true
, although in general matching data instances do not have an appropariate identifier.For the
qualified_part
approach, I tried injecting the identifier via a mixin. Evidently, this is not working.My questions is now: Am I doing it wrong? Conceptually or technically? Or is this a linkml limitation?
I suspect that something (else) is fishy with my schema, because running it through
gen-linkml
and then converting with the generated schema gives a suspiciously related error:Thanks in advance for your time!
Schema `MonoDataladDatasetVersion-schema.yaml`
```yaml id: https://example.com/reproducer name: reproducer prefixes: annex: https://concepts.datalad.org/namespace/annex-uuid/ DCAT: http://www.w3.org/ns/dcat# dct: http://purl.org/dc/terms/ dlco: https://concepts.datalad.org/ontology/ gitsha: https://concepts.datalad.org/namespace/gitsha/ linkml: https://w3id.org/linkml/ prov: http://www.w3.org/ns/prov# xsd: http://www.w3.org/2001/XMLSchema# default_prefix: dlco imports: - linkml:types types: PosixRelPath: uri: dlco:PosixRelPath base: str SHA1: uri: dlco:sha1 base: str UUID: uri: http://purl.obolibrary.org/obo/NCIT_C54100 base: str slots: access_id: range: string at_location: slot_uri: prov:atLocation range: Location distribution: range: Distribution endpoint_url: range: uri gitsha: range: SHA1 has_annex_remote: range: AnnexRemote has_part: slot_uri: dct:hasPart meta_id: identifier: true range: uriorcurie meta_type: designates_type: true range: uriorcurie qualified_access: range: QualifiedAccess qualified_part: range: QualifiedPart relation: slot_uri: dct:relation uuid: range: UUID classes: Location: class_uri: prov:Location MetaObject: class_uri: linkml:Any GitTracked: mixin: true slots: - gitsha - meta_id slot_usage: gitsha: required: true Resource: class_uri: DCAT:Resource slots: - has_part - meta_type - qualified_part slot_usage: has_part: range: Resource multivalued: true qualified_part: multivalued: true inlined: true inlined_as_list: true Dataset: is_a: Resource slots: - distribution Distribution: slots: - qualified_access AnnexDistribution: is_a: Distribution slot_usage: qualified_access: range: QualifiedAnnexAccess MonoDataladDatasetVersion: is_a: Dataset slots: - has_annex_remote slot_usage: has_annex_remote: multivalued: true inlined: true has_part: range: FileInGit multivalued: true inlined: true inlined_as_list: true qualified_part: range: QualifiedGitTrackedPart multivalued: true inlined: true inlined_as_list: true QualifiedAccess: slots: - access_id - relation slot_usage: relation: range: DataService QualifiedAnnexAccess: is_a: QualifiedAccess slot_usage: relation: range: AnnexRemote QualifiedPart: slots: - relation - at_location slot_usage: at_location: range: PosixRelPath relation: range: Resource QualifiedGitTrackedPart: is_a: QualifiedPart slot_usage: relation: range: FileInGit File: is_a: Resource slots: - distribution FileInGit: is_a: File mixins: - GitTracked AnnexedFile: is_a: FileInGit slot_usage: distribution: range: AnnexDistribution DataService: slots: - endpoint_url # although we do not expect any data service to have a unique identifier # we must add this slow here, rather than in derived classes, due to # a potential linkml limitation/bug # https://github.com/psychoinformatics-de/datalad-concepts/issues/30 - meta_id AnnexRemote: is_a: DataService slots: - uuid slot_usage: meta_id: required: true ```Example `MonoDataladDatasetVersion-example.yaml`
```yaml has_annex_remote: annex:7e0bf3e7-7d46-4093-813e-b4009826c3bf: uuid: 7e0bf3e7-7d46-4093-813e-b4009826c3bf has_part: gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58: meta_type: dlco:AnnexedFile gitsha: b94ef1797f7bfc1ac979be122e1b538bbb0d1d58 distribution: qualified_access: access_id: MD5E-s3425--32a617360d10e3dcbfdd0885e8d64ab8.txt relation: annex:7e0bf3e7-7d46-4093-813e-b4009826c3bf qualified_part: - at_location: README.txt # comment out the following line to get a working conversion relation: gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58 ```