w3c / json-ld-api

JSON-LD 1.1 Processing Algorithms and API Specification
https://w3c.github.io/json-ld-api/
Other
73 stars 29 forks source link

Under what situations would base_url in the context processing algorithm be invalid? #573

Open dclements opened 11 months ago

dclements commented 11 months ago

In 5.2.1 of the context processing algorithm it says:

Initialize context to the result of resolving context against base URL. If base URL is not a valid IRI, then context MUST be a valid IRI, otherwise a loading document failed error has been detected and processing is aborted.

But in the description of section 4.1.2 it says the following:

The required inputs are an active context, a local context, and a base URL used when resolving relative context URLs

Implementations seem to diverge in how they handle this with Rust's json-ld treating it as optional, the java titanium just ignores that it might not be a "valid" IRI, as does the ocaml RDF implementation.

Because this is a sub-algorithm for other algorithms (though it isn't treated as such per se) there don't seem to be any tests rooted in the inputs or expected outputs for this algorithm, but looking at where it does get called:

Set active context to the result of the Context Processing algorithm, passing active context, the value of the active property's local context as local context, base URL from the term definition for active property in active context, and true for override protected.

This is for the situation where the term definition is an expanded term definition containing a local context that must be a valid context definition. That does allow your base to be null but it is difficult to tell at a glance whether this would come up in practice during the processing.

So it seems like one of the following must be true:

  1. The description is incorrect and base_url is an optional (or at least nullable) attribute, but it cannot be "invalid."
  2. Some other piece is incorrect/incomplete and we expect invalid IRIs to come from somewhere, but don't say what they might look like. I guess a variation on this would be something like "despite being named base_url we should treat the value of base_url as being potentially not a URL, IRI, or anything else in the genre."
  3. The algorithm is incorrect and we never expect to see an invalid IRI at this stage of processing.
gkellogg commented 11 months ago

Generally, URLs and IRI should be valid based on the spec that defines them. I don't think we actually do rigorous validity checks, however. IIRC, there are some cases where setting base_IRI to null might be done (need to look at the test cases more thoroughly). In this case, it would be better to see if it is null, as doing an actual validity check is not really warranted.

While base_url is a required parameter, its value may be null. If an IRI, it MUST be a valid IRI, but we don't explicitly check for IRI validity. This would leave open the possibility that an implementation might use an invalid IRI for expansion, and another might actually do the validity check and skip it using the context URL instead.