kdl-org / kdl

the kdl document language specifications
https://kdl.dev
Other
1.09k stars 61 forks source link

KDL Schema spec clarifications/typos #315

Open Gankra opened 1 year ago

Gankra commented 1 year ago

I've been working through implementing a KDL Schema Validator, and I've run into a few questions. I will probably find more over time but this is what I have so far.

TYPOS

  1. ✅ "one info node for that describes the schema itself."
  2. ✅ there is a format "irl" (iri?)
  3. ✅ there is a format "irl-reference" (iri-reference?)

ERRORS / QUESTIONS

  1. ✅ does "zero or more..." imply optionality? Yes?
  2. ⁉️ node.children.tag: suggests it's mandatory and multiple, but schema-schema says 0 or 1?
  3. ❓ prop.children.required: fails to specify default value (true?)
  4. ❌ can enum validations be an empty list? (implying nothing matches?)
    1. ✅ I will make the docs clarify that they must have values
  5. ❓ what are the possible values for type validations? (are these core types like "string" or rich types like "(i32)"?
  6. ✅ the node node has "zero or more children" but the children node is itself a list of children..?
    1. ✅ should just be "optional"
    2. ❓ the "defintions" node should be able to have multiple children though?
  7. % is... problematic with floating point. There are ways to do it, the rust json-schema crate handles it, but want to confirm this pandora's box is intentional.
    1. ✅ zkat proposes dropping multiple to avoid the question
zkat commented 1 year ago
Gankra commented 1 year ago

Dope, I'll file a PR for some of these when I have better clarity on them. Some more issues:

ISSUES

  1. ❓ schema-schema: creative commons link missing its "rel" (documentation?)
  2. ❓ node.tag: "Validations to apply to the tag of the node." -- shouldn't this be a tag node? (schema-schema says so)

I AM CONFUSED

  1. tag node: "The tag describes the tags allowed in a children block or toplevel document."
    1. children node doesn't use this
    2. ❓ why does this exist / is this different from the children node? or the node node?
    3. ok wait in the schema-schema children does have tag by virtue of inheriting all of document's nodes (with ref)
      1. but that also that grabs a pile of random crap it shouldn't (at very least info)
      2. ❓ was info and definitions supposed to be not-under-the-document-node so that this kind of recursive definition would work?
      3. ❓ or was document supposed to have children?
      4. ❓ actually maybe this inheritance works if it's inverted. document ref's in the contents of children?
  2. ❓ what is the meaning/purpose of rel in a url?
    1. ❓ self?
    2. ❓ documentation?
Gankra commented 1 year ago

On ref and id:

  1. ❓ completely overlays the matched node onto this node
    1. ❓ ...including id? that seems... wrong.
  2. ❓ replaces conflicts
    1. ❓ does this only apply to positional values?
    2. ❓ kdl technically supports conflicting copies of key=val, but in a way that is basically an overlay so it's moot..?
    3. ❓ you can always have multiple copies of a node with a given name, so unclear how conflicts could be defined
      1. ❓ I guess meaningful for nodes that are "at most 1"?
  3. ❓ is the behaviour of ref consistent across all node types?
    1. ❓ wording varies, unclear if the net effect is always the same
  4. ❓ should it be required that the source and target are the same kind of node?
    1. ❓ i.e. should values only be allowed to import values?
    2. currently I guess not, schema-schema puns document and children
  5. ❓ what should happen if there are multiple matches?
    1. ❓ error? (yes please)
    2. ❓ merge? (god no)
  6. ❓ are ids really global, or per-node-kind?
    1. ❓ what happens if ids aren't globally unique? error?
    2. ❓ this can't be expressed in the schema-schema, should it?
  7. ❓ ref is defined to be an arbitrary KQL string but it's unclear if there's any practical application of anything but [id="blah"]
    1. not needing to support all of KQL as a subsystem would be... much easier
Gankra commented 1 year ago

updated my formatting a bit so it's easier to refer to specific points and I can keep track of where my understanding is

Gankra commented 1 year ago

On Schema Referencing

(This is future-facing stuff, not necessarily problems in the current spec)

  1. ❓ There is currently no way for a schema to specify what schema-schema it follows

    1. ❓ or equivalently, what version of the KDL Schema Spec it follows
    2. ❓i think there should be a top-level kdl-schema node to this effect (see point 2 below for why not in info)
    3. ❓ json-schema has a "$schema": "https://..." key for this purpose
      1. Using a URL to specify the version is interesting... mixed on it.
      2. In principle it allows a modicum of forward-compat... except a new schema-schema would presumably be self-referential
      3. If you did specify a schema-schema in the syntax of an older schema-schema, you...
        1. would be able to validate schemas using the new schema-schema for correctness (assuming all constraints of interested could be encoded in the schema-schema, which is not the case with things like id uniqueness)
        2. wouldn't be able to use those schemas because we still wouldn't understand the semantics
        3. could assume the schema is essentially the same but just includes modular things like new validations and proceed while ignoring things we don't understand (more on this in a subsequent comment)
  2. ❓Microsoft has a non-standard extension/convention that any JSON that has a top-level "$schema": "https://..." key will be validated against that schema

    1. love this, means weirdo formats can immediately get IDE support if the KDL Language Server supports this
    2. it also supports querying https://www.schemastore.org/json/ based on file name or something? works well for things like package.json or Cargo.toml
Gankra commented 1 year ago

On forward-compat / extensibility

(This is future-facing stuff, not necessarily problems in the current spec)

Forward-compat and extensibility are two sides of the same coin: to what extent can things be designed to support New Or Non-Standard Features. However all the defaults of the KDL Schema seem tuned heavily against this. In particular (unlike json-schema) everything defaults to "only what I've mentioned is allowed". I get the inclination but it's pretty hazardous to the goals of this section.

In my prototype I've identified a few places where extensibility could be supported:

  1. If a node has "all validations" in its children set I parse all unknown node kinds as Validation::Unknown(KdlNode) and assume I will provide some kind of hook for custom validators
  2. similarly there is Validation::UnknownFormat(String) for format "???"
  3. I don't currently do this for link rel="???", maybe I should

Features that would be useful for forward-compat:

  1. default values for optional things
    1. makes the schema more like documentation
    2. potentially allows applications to apply the defaults
    3. potentially allows future schemas with more complex rules to "bootstrap" semantics with actual values
  2. ability to declare an enum as open/non-exhaustive
    1. arguably useless in a vacuum (would negate the entire function), but if enum values could include descriptions this would again gain more documentation (IDE tooltips!)
    2. hmm or at this point should you just be using the proper values node? wait no each values entry is just specifying min/max len and then deferring to validations...