Open Silvanoc opened 1 month ago
what is wrong since a CURIE might not be a valid URI.
Yes, this is also clearly stated in the CURIE syntax Note document:
CURIEs and SafeCURIEs map to IRIs, but neither a CURIE nor a Safe_CURIE is an IRI or URI.
With the _
character there is a special case, that states:
The CURIE prefix '_' is reserved for use by languages that support RDF. For this reason, the prefix '_' SHOULD be avoided by authors.
Therefore even if there is a _
character in the CURIE prefix, it is still considered as a valid CURIE, but at the same time making it an invalid URI. A URI schema cannot have the character _
.
A use case for a valid CURIE starting with _
is an RDF Blank Node Identifier e.g., _:b0
Thanks @Silvanoc and @mahdanoura. You guys are much better at interpreting and testing against W3C specifications than I am. I would still like to add something to this issue:
The National Microbiome Data Collaborative has as use case in which the values of a uriorcurie
-type slot are expected to be converted into CURIes or URIs as part of linkml-convert
conversion form JSON or YAML into RDF. This doesn't currently work. As you might imagine, the values are just asserted as xsd:anyURI
-typed string literals.
I have written some Python code that detects the xsd:anyURI
-typed strings in a Turtle serialization and does the conversion to CURIes. @cmungall is aware of this and I think he has some intention of addressing it in LinkML. In the mean time, I hope we can keep the association between Urieorcurie type and xsd:anyURI, becasue without it, I don't see how I make the last step conversion to Turtle CURIes.
We don't have any prefixes containing underscores in our main schema file. I don't think we have any in the other import either but I haven't checked yet.
The National Microbiome Data Collaborative has as use case in which the values of a uriorcurie-type slot are expected to be converted into CURIes or URIs as part of linkml-convert conversion form JSON or YAML into RDF. This doesn't currently work. As you might imagine, the values are just asserted as xsd:anyURI-typed string literals.
The Uriorcurie
type should accept both, therefore any change fulfilling this requirement should not break this use-case.
WRT the assertion as xsd:anyURI
, I wonder how/where that assertion is taking place. AFAIK no XML-Schema validation is taking place and most part of the validation takes place checking against a generated JSON-Schema. In that sense the usage of xsd:anyURI
to specify the URI of the type Uriorcurie
is only a hint for the JSON-Schema generator to know how to validate it. My expectation would be that any change in the JSON-Schema generation that keeps the validity of the value either as an URI or as a CURIE should not break anything.
In the end IMO as long as we are confident that the tests are covering that use-case, we should be able to work on improving the validation.
We don't have any prefixes containing underscores in our main schema file. I don't think we have any in the other import either but I haven't checked yet.
You might not have any, but:
The current situation is that you are declaring an XML-Schema type that does not comply with the specification (an XML-Schema validation of test_pref:Boat
where an Uriorcurie
is expected would fail). But since the validation is using a JSON-Schema type much more relaxed than the declared XML-Schema, nobody seems to notice it.
OMG, I did write that test. I agree that it was a bad choice and am in support of any mechanism that would invalidate it!
OMG, I did write that test. I agree that it was a bad choice and am in support of any mechanism that would invalidate it!
Hopefully you don't mean to invalidate the test... Because that test is the very single one ensuring that we have at least one for CURIEs that are valid, but the can not be mistaken for an URI just because they are syntactically valid.
I have written some Python code that detects the
xsd:anyURI
-typed strings in a Turtle serialization and does the conversion to CURIes. @cmungall is aware of this and I think he has some intention of addressing it in LinkML. In the mean time, I hope we can keep the association between Urieorcurie type and xsd:anyURI, becasue without it, I don't see how I make the last step conversion to Turtle CURIes.
@turbomam how do you handle in your code references of LinkML type Curie
? Because they are going to get the datatype xsd:string
, right? How do you want to make them apart from simple strings? Since strings also get xsd:string
.
Apart from that, since both LinkML types Uri
and UriOrCurie
are getting the URI xsd:anyURI
, you need to "probe" both, although it should be needed only for UriOrCurie
.
https://github.com/linkml/linkml-model/pull/202 (base for discussion as of now) is trying to resolve those ambiguities, you might want to have a look at it. The fact that UriOrCurie
has some space for ambiguity is known and intrinsically accepted in this type, since we are not using SafeCURIEs. But at least my proposal constraint those ambiguities to that type.
from dev call: let's focus on fixing the RDFWriter to convert this correctly.
Describe the bug
Urieorcurie type is declared with URI xsd:anyURI, what is wrong since a CURIE might not be a valid URI.
To reproduce Steps to reproduce the behavior:
xsd:anyURI
on an attribute value. This is the one that I have used: ```'_'
in the prefix. This is what I have used:<slot_type src="pre_fix:reference"/>
cvc-datatype-valid.1.2.1: 'pre_fix:reference' is not a valid value for 'anyURI'.
or similar.Expected behavior A type that gets a XML-Schema URI as its URI, should comply with the corresponding XML-Schema.
The is no type in the "W3C XML Schema Definition Language (XSD) 1.1" for CURIEs. The CURIE specification provides an XML-Schema for CURIEs, but this does not help for the URI of the LinkML
Uriorcurie
orCurie
types.In this specific case, having
xsd:anyURI
as URI forUriorcurie
LinkML type should mean that any value that is a valid URI or CURIE should pass the XML-Schema validation, what is not true for the valid CURIEtest_pref:Boat
.Additional context LinkML Model version: 1.8.x
I have discovered the issue by chance when trying to improve the JSON-Schema generated for URI, CURIE and URIorCURIE in MR #2212.
In commit 82b753b1ae9cf23b06707b04cb8838dea654b161 I have assigned JSON-Schema type
string
with formaturi
for all there types, assuming that CURIEs would be valid URIs because of the use ofxsd:anyURI
for the URI of the LinkML typeUriorcurie
.Luckily a test exists covering the corner case of a CURIE prefix containing an underscore:
test_pref:Boat
and this test is failing with the changes proposed in the PR #2212.