kdl-org / kdl

the kdl document language specifications
https://kdl.dev
Other
1.09k stars 61 forks source link

Better json-in-kdl microsyntax #281

Closed LemmaEOF closed 1 year ago

LemmaEOF commented 2 years ago

The current JiK syntax, while functional, feels very clunky to me. The original syntax was created before type hints, and while the 2.0.0 specification takes advantage of them, it doesn't quite feel right. I'd like to propose a new version of JiK that uses type hints to their full ability, while making JiK easier to both read and write. This isn't a full PR or specification yet because I'd like to hear what other folks think.

TL;DR:

Examples

{
    "foo": 5,
    "bar": [
        true,
        false,
        null
    ],
    "baz": {
        "qux": "thud"
    }
}
(object)- {
    foo 5
    (array)bar {
        - true
        - false
        - null
    }
    (object)baz {
        qux "thud"
    }
}
zkat commented 2 years ago

/cc @tabatkins this seems neat!

bgotink commented 2 years ago

I recently spent some time working on replacing angular.json with a KDL format (#283) and came up with an alternative as well, as JiK is currently very verbose and its usage of tags/types vs node names is the other way around from what I'm using in regular KDL.

My "JiK" format is currently a lot more implicit. The main motivation for choosing implicit type detection over explicit (array)/(object) is that only parts of the KDL files I'm using are JSON values. These parts are surrounded with "regular" KDL. Requiring explicit types for the JSON part of the file would cause confusion as to what part of the file requires these types and which part doesn't.

The type is decided as follows:

  1. If the value has properties, it's an object
  2. if the value has children that are all named -, it's an array
  3. if the value has children, it's an object
  4. if the value has multiple arguments, it's an array
  5. otherwise it's a primitive value

The top-level value can be handled the same way, so there's no need for a -{ /* ... */ } wrapper.

I believe there's only one case where an explicit (object)/(array) tag is needed, and that's for an object with a single property called -.

A downside to this approach is that there are often multiple ways to encode the same data, for example:

foo 5
bar {
    - true
    - false
    - null
}
baz {
    qux "thud"
}

is identical to

foo 5
bar true false null
baz qux="thud"
LemmaEOF commented 2 years ago

oooh yeah, that's a good heuristic for array vs object! The current JiK version has a few different ways to encode data, so that's not a problem at all.

tabatkins commented 1 year ago

Yeah, I've never been super happy with how JiK turned out. (Unlike XiK, which I think is great, due to the closeness of the data models.)

Your heuristics look pretty interesting. I'm a little confused by the array nodes, tho - it looks like the node name is just ignored (unless it's a child of an object, in which case it's the key)? Similarly for primitives - so long as it has a single argument, the node name doesn't do anything?

I think there's another case that needs an explicit tag, fwiw - a single-element array using arguments. - 5 would be recognized as a primitive 5 by your heuristics, while - 5 6 would be recognized as [5, 6], right? So you'd need (array)- 5 if you didn't want to use children. Is this right?

tabatkins commented 1 year ago

Oh, hm, maybe the nodename feeds into your heuristics, actually? It's not mentioned, but if - is used solely for primitives, and any other name indicates an array/object according to the heuristics, then indeed we're left with just the single ambiguous case of an object containing a single item whose key is "-", where we'd require it to be tagged as (array) or (object) (it would be a syntax error if untagged).

(Then we'd just say that array and object are the canonical nodenames used when auto-converting from JSON, but handwritten KDL can use whatever.)

tabatkins commented 1 year ago

Ah no, nevermind, since the child of an object uses the nodename as its key, we can't rely on nodename for the heuristics at all - {foo: 5} would be written as - { foo 5 }.

LemmaEOF commented 1 year ago

I think using node names as type information kinda goes against the current state of KDL - that's what type hints are for. I proposed - for outer objects/arrays and array elements as a way of saying "this element has no given name", and using them as heuristic is purely happenstance.

bgotink commented 1 year ago

Indeed, the "-" would mean "this item has no name". We can use the fact that the names of array items doesn't matter to enforce using - for all array items. This allows us to detect arrays vs object automatically instead of always requiring explicit tags.

There's one ambiguous case between objects and arrays. For example:

- {
  - true
}

Is this [true] or {"-": true}?

If there were more than one - node, it would definitely be an array because objects can't have duplicate keys. If there were children with other names, it has to be an object as arrays don't have named children. If the node had values, it would definitely be an array because objects can't have values. If the node had properties, it would definitely be an object because arrays can't have properties.

I believe it makes most sense to interpret this as [true], because it's easier to reason "if all children are named - then it's an array" without the exception "unless there's only one child". I also thinks that arrays with a single item are more common than objects with a single property named -.

So how to write {"-": true} in JiK? Two options

// Use the `-` property, which only works if the value is a literal
- -=true

// Add an explicit tag/type to mark it as object
(object)- {
  - true
}

Similarly there's one ambiguity between arrays with a single primitive value. I can write [true] as

- {
  - true
}

and [true, false] as

- true false

but I can't write [true] as

- true

because that encodes the literal value true. Here too an explicit type/tag could allow me to write

(array)- true
bgotink commented 1 year ago

Note I don't think we should enforce - as name of the root node. Instead, I think we should allow any name for the root node.

Using - will still make sense for a lot of documents as the name doesn't matter, but allowing any name would make it easy to embed JiK inside a proper KDL document. For example, here's a hypothetical document describing a JSON HTTP request:

// this node is KDL, note the combination of value and property
request "/api/cart" method="PUT" {
    // this is a JiK node, the content is JSON
    body {
        coupon "cuddle"
        items {
            - id=1234 amount=1
            - id=2341 amount=2 {
                options {
                    color "red"
                    size "XXL"
                }
            }
        }
    }
}
LemmaEOF commented 1 year ago

ooh yeah, that's smart! embeddability is a great use-case that I forgot about, I'm very much in favor!

tabatkins commented 1 year ago

Okay, so that second-level part is JSON-encodable to:

{
  "coupon": "cuddle",
  "items": [
    {"id": 1234, "amount": 1},
    {"id": 2341, "amount": 2, "options": {"color": "red", "size": "XXL"}}
  ]
}

Right?