w3c / json-ld-api

JSON-LD 1.1 Processing Algorithms and API Specification
https://w3c.github.io/json-ld-api/
Other
76 stars 29 forks source link

Invalid IRIs in compaction test #tp004 #517

Open timothee-haudebourg opened 3 years ago

timothee-haudebourg commented 3 years ago

Hi,

In the step 14.2.3 of the Create Term Definition algorithm after expanding the value associated to the @id entry of a term definition, the algorithm continues as follows:

If the resulting IRI mapping is neither a keyword, nor an IRI, nor a blank node identifier, an invalid IRI mapping error has been detected and processing is aborted

However, here is the context given for compact#tp004:

{
  "@context": {
    "ex": "http://example.com/",
    "colon": "http://example.org/:",
    "question": "http://example.org/?",
    "hash": "http://example.org/#",
    "lbracket": "http://example.org/[",
    "rbracket": "http://example.org/]",
    "at": "http://example.org/@"
  }
}

In this context, http://example.org/[ and http://example.org/] are not valid IRIs (both characters [ and ] are reserved and can only appear in the host part to denote an IP literal). My implementation of IRIs is strict, and rejects such IRI. And since according to step 14.2.3 the parsed value must but either an IRI (which they are not), a blank node identifier (which they are not) or a keyword (which they are not), I end up returning an invalid IRI mapping error although the test is supposed to succeed.

gkellogg commented 3 years ago

Not really inconsistent with the text, though. It just says if it's not an IRI that an error is detected, and doesn't call to validate the IRI.

That said, it would be better if the examples used valid IRIs, but I don't think there are valid cases where "[", or "@" can appear at the end of an IRI. "]", could be used in http://[127.0.0.1], although I wouldn't be surprised if many IRI parsers didn't handle this.

I suggest we just remove "lbracket", "rbracket", and "at" terms from the tests that use it.

timothee-haudebourg commented 3 years ago

Reading the JSON-LD syntax specification again, I see:

The value of keys that are not keywords MUST be either an IRI, a compact IRI, a term, a blank node identifier, a keyword, null, or an expanded term definition.

where "IRI" directly links to the syntax section of RFC3987. It seems pretty clear to me that the context of #tp004 does not fit the specification. It is not necessarily a problem for the context processing algorithm as defined in the JSON-LD API document (why not accepting more context that necessary), it is just another point in favor of modifying this test.

gkellogg commented 3 years ago

The value of keys that are not keywords MUST be either an IRI, a compact IRI, a term, a blank node identifier, a keyword, null, or an expanded term definition.

Note that this refers specifically to keys within a context definition, and not a JSON-LD document in general.

In #tp004, the keys are in a node object, not a context definition. Strictly speaking, the algorithm should ignore such keys, and could be addressed as discussed in https://github.com/w3c/json-ld-api/issues/533#issuecomment-915556257.

timothee-haudebourg commented 3 years ago

In #tp004, the keys are in a node object

The issue is about the invalid IRIs that are part of the context of #tp004:

"@context": {
    ...
    "lbracket": "http://example.org/[",
    "rbracket": "http://example.org/]",
    ...
}

There are also invalid IRIs in the node object, but I now understand that there is no problem there.