Open roptat opened 4 years ago
also, test 0076 expects http://example.com/api/things/1
to be compacted as 1
when the base is itself. Wouldn't the empty string work too and be more compact?
Hi,
in compaction test 0066, some IRIs are to be compacted relative to the document base IRI. The base IRI is
https://w3c.github.io/json-ld-api/tests/compact/0066-in.jsonld
and an examples of IRIs to compact ishttps://w3c.github.io/absolute
. The expected result is../../../absolute
, however,/absolute
seems to be valid and more compact. Why not compact IRIs more when possible? One could simply choose the shortest between the path of the iri to compact, and the relative path with '..'s.
IRI compaction for document-relative IRIs defaults to doing Relative IRI reference resolution as described in RFC3986/7. There is a normalization algorithm that reduces such IRIs to their minimal form, but it is not called for in these algorithms.
The Relative Resolution algorithm, which must be used, is described in RFC3986 Section 5.2. There are a number of subtlties and the test suite has even recently had more tests introduced to probe some of the corner cases.
RDF syntaxes (such as JSON-LD) treat IRI/URIs which may have an equivalent normalized representation as different, so introducing normalization as part of the IRI compaction process would violate this.
also, test 0076 expects
http://example.com/api/things/1
to be compacted as1
when the base is itself. Wouldn't the empty string work too and be more compact?
This is been in since 1.0, and it's hard to find any explicit requirement to do this, but it can be inferred from looking at RFC3986 5.2.3 Merge Paths, for which this is the operate operation. In that case, the portion of the path after the last "/" is discarded, which is what's going on here.
Also, intuitively, if you had a base of "http://example.com/api/things/1"
, you could either compact that to "" or "1", but if you wanted to compact "http://example.com/api/things/2"
, it could only compact to "2", which would be inconsistent.
I don't really understand the answer. When implementing the expansion algorithm, I indeed saw that the RFC3986 Relative Resolution algorithm was used, and I implemented it. However, IIUC, this algorithm is one-way only: it takes a base and a relative reference, and gives you a new absolute IRI. However, in the IRI compaction algorithm, we want to do the reverse: get a relative IRI reference from an absolute IRI. I think the exact algorithm used to perform that operation is missing from the specification, hence my questions.
I agree that the spec could be more explicit in how to perform this operation, but it is quite late to introduce such an algorithm as we're ending the Candidate Recommendation period. The adherence to the test suite is what determines conformance, and sometimes this requires "reading between the lines". We can defer adding to spec text to a future version, which could come fairly soon after the release of the final recommendation.
@roptat,
Wouldn't the empty string work too and be more compact?
The goal of compaction is not to make the data size as small as possible without regard for its semantics; it is not "compression" like gzip. Rather, compaction enables the data to be more readily parsed and understood by humans or programs that are expecting it to conform to a certain context.
I think I'm suffering from the same imprecision. One particular case that I don't understand is that http://example.com/api/things/1
is compacted into 1
, but http://example.com/api/things/1#foo
is compacted into #foo
. Following your logic, isn't it inconsistent with http://example.com/api/things/2
being compacted into 2#foo
?
Instead I would expect http://example.com/api/things/1#foo
to be compacted into 1#foo
. But this is not what is expected by compaction test #t0066.
It's common for fragment identifiers to be appended to IRIs, and the base IRI, so that when compacting you get URIs of the form #foo
, but it certainly would carry the same semantics if it were compressed to 1#foo
; that's just not how 1.0 implementors interpreted step 10 of IRI Compaction.
I don't believe any other RDF specs describe how IRIs should be compacted, and we used our best understanding to come up with a consistent interpretation, but as I said, the text in the spec doesn't make this explicit.
My own implementation uses remove_base
, which is described here:
def remove_base(base, iri)
return iri unless base
@base_and_parents ||= begin
u = base
iri_set = u.to_s.end_with?('/') ? [u.to_s] : []
iri_set << u.to_s while (u = u.parent)
iri_set
end
b = base.to_s
return iri[b.length..-1] if iri.start_with?(b) && CONTEXT_BASE_FRAG_OR_QUERY.include?(iri[b.length, 1])
@base_and_parents.each_with_index do |bb, index|
next unless iri.start_with?(bb)
rel = "../" * index + iri[bb.length..-1]
return rel.empty? ? "./" : rel
end
iri
end
Basically, it has a rule to return the fragment or query appended to base, if defined, otherwise, it uses '../' sequences added to ipath-absolute minus any trailing isegement of the base IRI, as necessary.
This could be detailed in an erratum to the API spec, in appropriate algorithmic speak.
Hi,
in compaction test 0066, some IRIs are to be compacted relative to the document base IRI. The base IRI is
https://w3c.github.io/json-ld-api/tests/compact/0066-in.jsonld
and an examples of IRIs to compact ishttps://w3c.github.io/absolute
. The expected result is../../../absolute
, however,/absolute
seems to be valid and more compact. Why not compact IRIs more when possible? One could simply choose the shortest between the path of the iri to compact, and the relative path with '..'s.