JSON-LD Validation Scoping

We need to lay out some clear parameters for what we will be considering a "valid" JSON-LD object, and also possibly make some adjustments with respect to the Psych-DS specification as laid out in the Google Doc.

Here are some of the official requirements of JSON-LD, as found here:

A JSON-LD document MUST be able to express a linked data graph* (elaborated below) []
A JSON-LD document MUST be a valid JSON document [X]
All JSON constructs MUST have semantic meaning in a JSON-LD document: [X]
JSON arrays MUST NOT be interpreted as defining an object ordering. [X]

(There are other bullets in their list, but they are all SHOULDS and MAYs, where we are most interested in the MUSTs

Guidelines for a linked data graph: (in the same doc as above)

Subject, objects and edges all SHOULD be identified with IRIs

^^^Part of my issue with this combination of requirements is that they seem so bottom out with "being valid JSON", because:

Even though a JSON-LD document must be expressable as a linked data graph, the requirements for a linked data graph are all non-normative, all SHOULDs.
The requirement that JSON constructs must have meaning refers to something intentional rather than technical. That is, from what I can tell, it's not saying that all JSON constructs must be linked to some informative IRI, it's saying that the user must not use values that don't mean anything
The requirement that arrays must be interpreted as unordered is a matter of interpretation, not computer validation.

In the world of JSON-LD there are an abundance of SHOULDs and barely any MUSTs. What we have to decide is whether to codify our own set of MUSTs for Psych-DS specific JSON-LD, or to just keep valid JSON format as the only MUST and implement the variety of SHOULDs as warnings.

For instance, we allow users to include non-schema.org keys (or rather, string keys that don't link to any IRI) within their metadata, which is allowed according to strict JSON-LD rules, but recommended against. Here are some questions:

Do we want to require that schema.org context MUST be included, and that the required terms of our spec such as "name" and "variableMeasured" MUST expand to their full schema.org IRIs?
Do we want to allow for expanded, contextless JSON-LDs as valid metadata files?
If we do choose to implement the full gamut of JSON-LD SHOULDs, are we prepared to present those recommendations to the user, at risk of overwhelming them?
Do we want to allow for namespaces other than schema.org in the context?
Do we want the validator to check that JSON-LD IRIs actually point to real web pages? [This has implications for our eventual python version, for which offline functionality is a desideratum]

There are other questions, but this set covers the gist of it. Including some misc. references below, such as Best Practices and the official "JSON-LD grammar":

Here are some "best practices" put forth by W3C:

Best Practice 1: Publish data using developer friendly JSON Best Practice 2: Use a top-level object Best Practice 3: Use native values Best Practice 4: Assume arrays are unordered Best Practice 5: Use well-known identifiers when describing data Best Practice 6: Provide one or more types for JSON objects Best Practice 7: Identify objects with a unique identifier Best Practice 8: Things not strings Best Practice 9: Nest referenced inline objects Best Practice 10: When describing an inverse relationship, use a referenced property Best Practice 11: External references SHOULD use typed term Best Practice 12: Ordering of array elements Best Practice 13: Provide a representation of the entity related by URL Best Practice 14: Cache JSON-LD Contexts

JSON-LD Grammar

(Interesting point from the above grammar: unlinked keys in the JSON-LD MUST be ignored when processed. We may want to remind users that adding unlinked keys to their metadata does not technically add to its richness, since it will be ignored during any official processing on the web)

additional MUSTs that we can glean from the grammar:

A JSON-LD document MUST be a single node object, a map consisting of only the entries @context and/or @graph, or an array of zero or more node objects.
the keys in objects MUST be unique.
A term MUST NOT equal any of the JSON-LD keywords, other than @type.
When used as the prefix in a Compact IRI, to avoid the potential ambiguity of a prefix being confused with an IRI scheme, terms SHOULD NOT come from the list of URI schemes as defined in [IANA-URI-SCHEMES]. Similarly, to avoid confusion between a Compact IRI and a term, terms SHOULD NOT include a colon (:) and SHOULD be restricted to the form of isegment-nz-nc as defined in [RFC3987].
To avoid forward-compatibility issues, a term SHOULD NOT start with an @ character followed exclusively by one or more ALPHA characters (see [RFC5234]) as future versions of JSON-LD may introduce additional keywords. Furthermore, the term MUST NOT be an empty string ("") as not all programming languages are able to handle empty JSON keys.
all of the aspects of context definitions are MUSTS
all of the expanded term definition requirements apply as MUSTs

note: Screenshot 2023-11-29 at 1 29 26 PM This refers to the eventual deprecation of non-IRI keys in JSON-LD

psych-ds / psych-DS

JSON-LD Validation Scoping #33