psych-ds / psych-DS

Welcome to Psych-DS! If this is your first time visiting a Github repository, look to the left/down to the README (below the repository files.) Psych-DS is a specification for behavioral datasets - JSON-LD metadata, predictable directory structure, and machine-readable specifications for tabular datasets in behavioral research
Creative Commons Attribution 4.0 International
77 stars 6 forks source link

JSON-LD Validation Scoping #33

Open bleonar5 opened 7 months ago

bleonar5 commented 7 months ago

We need to lay out some clear parameters for what we will be considering a "valid" JSON-LD object, and also possibly make some adjustments with respect to the Psych-DS specification as laid out in the Google Doc.

Here are some of the official requirements of JSON-LD, as found here:

(There are other bullets in their list, but they are all SHOULDS and MAYs, where we are most interested in the MUSTs

Guidelines for a linked data graph: (in the same doc as above)

^^^Part of my issue with this combination of requirements is that they seem so bottom out with "being valid JSON", because:

  1. Even though a JSON-LD document must be expressable as a linked data graph, the requirements for a linked data graph are all non-normative, all SHOULDs.
  2. The requirement that JSON constructs must have meaning refers to something intentional rather than technical. That is, from what I can tell, it's not saying that all JSON constructs must be linked to some informative IRI, it's saying that the user must not use values that don't mean anything
  3. The requirement that arrays must be interpreted as unordered is a matter of interpretation, not computer validation.

In the world of JSON-LD there are an abundance of SHOULDs and barely any MUSTs. What we have to decide is whether to codify our own set of MUSTs for Psych-DS specific JSON-LD, or to just keep valid JSON format as the only MUST and implement the variety of SHOULDs as warnings.

For instance, we allow users to include non-schema.org keys (or rather, string keys that don't link to any IRI) within their metadata, which is allowed according to strict JSON-LD rules, but recommended against. Here are some questions:

There are other questions, but this set covers the gist of it. Including some misc. references below, such as Best Practices and the official "JSON-LD grammar":

Here are some "best practices" put forth by W3C:

Best Practice 1: Publish data using developer friendly JSON Best Practice 2: Use a top-level object Best Practice 3: Use native values Best Practice 4: Assume arrays are unordered Best Practice 5: Use well-known identifiers when describing data Best Practice 6: Provide one or more types for JSON objects Best Practice 7: Identify objects with a unique identifier Best Practice 8: Things not strings Best Practice 9: Nest referenced inline objects Best Practice 10: When describing an inverse relationship, use a referenced property Best Practice 11: External references SHOULD use typed term Best Practice 12: Ordering of array elements Best Practice 13: Provide a representation of the entity related by URL Best Practice 14: Cache JSON-LD Contexts

JSON-LD Grammar

(Interesting point from the above grammar: unlinked keys in the JSON-LD MUST be ignored when processed. We may want to remind users that adding unlinked keys to their metadata does not technically add to its richness, since it will be ignored during any official processing on the web)

additional MUSTs that we can glean from the grammar:

note: Screenshot 2023-11-29 at 1 29 26 PM This refers to the eventual deprecation of non-IRI keys in JSON-LD

bleonar5 commented 7 months ago

After doing a deeper dive into the jsonld.js package, I can see that it does produce error messages that correspond directly to a lot of the MUSTs from the JSON-LD Grammar. These mostly seem to revolve around restricted usages for the various "@" keywords.

This is great, because it means we can offload a lot of this fine-grained syntactic validation of json-ld objects to the official package itself, funneling its error messages into our app's validation "issues" that get presented to the user. One nice thing about these error cases is that they only really arise when you begin to use some of JSON-LDs more complex features, so there's not as much of a worry of these checks being prohibitive to beginners.

There's another category of JSON-LD MUSTs that result in ignored content rather than an error message. For instance, in the JSON-LD playground, using a key that resolves to a string instead of an IRI results in that key being dropped. We have to decide whether such violations ought to be errors or warnings.