w3c / json-ld-api

JSON-LD 1.1 Processing Algorithms and API Specification
https://w3c.github.io/json-ld-api/
Other
76 stars 29 forks source link

How can compaction algorithms be used to transform a JSON-LD document with unknown `@profile` value(s) to a JSON-LD document with only known `@profile` value(s)? #610

Open TallTed opened 1 month ago

TallTed commented 1 month ago

Challenge arose in VCDM and Data Integrity

https://github.com/w3c/vc-data-model/blob/59ed20b1886d34976fa9e729c0b70556cd298e39/index.html#L4375-L4378

Applications MAY
use JSON-LD <a data-cite="JSON-LD11-API#compaction-algorithms">compaction
algorithms</a> to transform a document that uses an unknown JSON-LD context
to one that does not, so the new document's terms will match expectations.

I could not find instructions for the above described transformation. I think that such guidance should allow the application developer to make the described transformation without becoming a JSON-LD expert.

I think that the application developer must —

  1. expand the JSON-LD that has unknown JSON-LD context(s)
  2. remove the declaration of the unknown JSON-LD context(s)
  3. (maybe?) add declaration of known JSON-LD context(s)
  4. compact the JSON-LD

Seems best to me, to add such documentation to JSON-LD docs, which can then be cited by VCDM, Data Integrity, and others.

(attn: @msporny, @dlongley)

gkellogg commented 1 month ago

You're correct that compaction involves expansion. Expanding a JSON-LD document eliminates any contexts it uses, so there is no declaration of an unknown JSON-LD context. Compaction takes a specific (known) context to use for the compaction, and the result will include an @context element referencing the context used to do the compaction.

During discussion, the point seemed to be that if a document used a term that did not expand properly that such an entry would be dropped, although I don't see that concern explicitly in the text you cite.

I'm not sure what else needs to be added to make the suggested statement unambiguous.

TallTed commented 1 month ago

During discussion, the point seemed to be

I definitely expressed myself unclearly, as this point was not my intent at all.

I am thinking that something like the following should be added somewhere — could be any or all of JSON-LD, VCDM, or DI —

A JSON-LD document that declares an unknown `@context` value can be expanded
(which removes all `@context` declarations), and then re-compacted with one or
more known `@context` declarations. JSON-LD terms that are not found within
that known `@context` will remain as absolute URIs in the new document, while
JSON-LD terms that _are_ found within the known `@context(s)` will be changed
to plain literal terms.

Examples of such before and after documents — showing the changes of dummy "unknown" @context declarations and literal JSON keys to dummy "known" @context declarations, literal JSON keys, and URI JSON keys — would probably help comprehension.

msporny commented 1 month ago

Yes, what @TallTed says above is much closer to the language I was hoping for. As to where this language goes, I'm a bit ambivalent, but it would be nice for the JSON-LD spec to speak directly to that notion.

dlongley commented 1 month ago

Just to be clear, this issue is not about "invalid or missing" term definitions. This is about receiving a document with an @context containing some previously unseen URL or object and re-compacting that document to one that uses a well-known context (i.e., one that uses term definitions that some code has been written against).

For example:

const incomingDoc = {
  "@context": "https://never-seen-before.example",
  // to code that doesn't know ^ this context, it must not
  // try to understand the meaning of the JSON key from
  // its literal string of characters
  "shouldBeConsideredOpaque": "some value"
};

const wellKnownContext = {
  "@context": {
    "cats": "https://cats.com#cats"
  }
};

const recompactedDocument = compact(incomingDoc, wellKnownContext);

if(recompactedDocument.cats !== undefined) {
  // do some cat stuff
} else {
  throw new Error("Sorry no cats!");
}

If we need a more concrete use case, what was discussed over in the VCWG was a case where the incoming document used multiple contexts, one that defined international driver's license terms and one that defined US-only driver's license terms. The consumer only understood the international driver's license terms, so they could recompact that context, removing any US-only context:

const incomingDoc = {
  "@context": [
    "https://international-dl.example",
    "https://usa-dl.example"
  ],
  "international_dl_field": "some international value",
  "usa_dl_field": "some US value"
};

const wellKnownContext = "https://international-dl-example";
const documentLoader = url => {
  if(url === wellKnownContext) {
    return {
      // return `RemoteDocument` object with static context
      contextUrl: null,
      documentUrl: url,
      document: {
        "@context": {
          "international_dl_field": "https://international-dl.example/vocab#dl_field"
        }
      }
    };
  }
  return someDefaultNetworkDocumentLoader(url);
};

const recompactedDocument = compact(incomingDoc, wellKnownContext, documentLoader);
// Note: the US fields will be fully expanded now to URLs and ignored,
// and `recompactedDocument` looks like:
/*
{
  "@context": "https://international-dl.example",
  "international_dl_field": "some international value",
  "https://usa-dl.example/vocab#usa_dl_field": "some US value"
}
*/

if(recompactedDocument.international_dl_field !== undefined) {
  // do some international DL stuff
} else {
  throw new Error("Sorry no international DL stuff!");
}
TallTed commented 1 month ago

To help eliminate confusion within this issue, @gkellogg, please <strike> or otherwise edit https://github.com/w3c/json-ld-api/issues/610#issuecomment-2248626555 such that only the first and last paragraphs remain in play. If the rest of that comment needs to be pursued further, I suggest that it go into another issue.

As far as how "to make the suggested statement unambiguous" — ambiguity is not my concern. Removing the current requirement that implementers of VCDM or DI fully grok JSON-LD is my concern; implementers of VCDM or DI should generally be able to follow only the algorithms/recipes therein, which are much simpler and more focused than those in JSON-LD.

gkellogg commented 1 month ago

Still not sure what needs to be added to the spec; could it be just a best practice?

The act of compacting a document always expands it first, which specifically is there to remove contexts, so (presuming that a document loader doesn't restrict it) an unknown context is used for the expansion, but the provided context (wellKnownContext) is used for compaction. This is just the way that JSON-LD work, and I don't see what adding any text would accomplish.

If you want to discuss a use case for re-compacting a JSON-LD document to eliminate unknown contexts, it would seem to be just "compact the document using the well-known context".

msporny commented 1 month ago

This is just the way that JSON-LD work, and I don't see what adding any text would accomplish.

Yes, you're right in "that's the way JSON-LD works". However, it seems like we need to say /something/ to avoid permathreads like this:

https://github.com/w3c/vc-data-integrity/issues/272

Granted, only part of that permathread is about this issue, but it's clear that people don't quite understand how basic JSON-LD compaction works (nor probably want to learn about how it works). Pointing them to the existing section in the JSON-LD specification on compaction and expansion didn't seem to help either. The guidance that @TallTed is asking for would probably be fine as a BCP, but I'm not sure if some in that thread would agree.

We just merged some text this weekend that made an attempt at some guidance here:

https://w3c.github.io/vc-data-integrity/#validating-contexts

Perhaps, ideally, we wouldn't have that section in the Data Integrity specification, but would rather put it in a JSON-LD WG specification. Whether that's in the core JSON-LD spec, or a BCP document, is up to the WG to decide.

gkellogg commented 1 month ago

I wouldn't be adverse to adding some informative paragraphs, or a sub-section to the Compaction Algorithm that describes how compaction can be used to remove/replace unknown contexts with a well-known context along with some text that describes why you might want to do this. But, you guys are probably in the best position to create such a PR.

TallTed commented 1 month ago

The act of compacting a document always expands it first

So far as I have found, nothing has explicitly said that in such simple language, until this thread. Perhaps I've overlooked it

This is just the way that JSON-LD work, and I don't see what adding any text would accomplish.

People who don't already know that "[this] is just the way that JSON-LD [works]" would benefit by having even just that sentence added to the spec, but I think a few more sentences would be better. I don't think it needs to be more than a few paragraphs, if that much.

I believe the JSON-LD algorithms express that "compacting a [JSON-LD] document always expands it first", but understanding those algorithms requires a fairly deep dive into technical lingo, and making one's brain pretend it's silicon for long enough to walk through the algorithm oneself, which should not be necessary for all readers nor all deployers of these technologies.

A developer of a tool that they're linking to a JSON-LD processing library should be able to just know (and even this may be more than they really need to know) that if their tool asks the library to compress a given JSON-LD document based on (for instance) their corporate standard @profile declarations, then the JSON-LD library will (1) expand the original JSON-LD document based on the @profile declarations it contains, (2) replace those original @profile declarations with that/those provided as inputs to the JSON-LD library's compress (or (re)compress) routine, and (3) return a compacted JSON-LD document that uses the @profile declarations they provided and leaves out the original @profile declarations.

It might be better for such a library to make available an interface or API that starts with "replace existing @context..." and then walks through submission of file specification(s), URI(s), and/or plaintext @context values, which are then used to perform the expansion-and-(re)compaction described above.

TallTed commented 1 month ago

I also suggest avoiding such TLAs (Three Letter Acronyms) as BCP, unless expansion is provided nearby. There are many possible interpretations of BCP, and it's not immediately clear whether "Best Current Practice(s)" is what was intended (though it seems likely).