ietf-wg-httpapi / mediatypes

Other
5 stars 4 forks source link

What is the rationale behind allowing YAML alias nodes as fragment ids? #72

Closed MikeRalphson closed 1 year ago

MikeRalphson commented 1 year ago

https://github.com/ietf-wg-httpapi/mediatypes/blob/0d2cef8bf8e658b2d34ac8e48c3ff999ca827305/draft-ietf-httpapi-yaml-mediatypes.md?plain=1#L119-L124

Again, raising here as suggested by @dret, but please direct me to existing issue/comments.

My concern is one of complexity and tooling support, with regard to parsing YAML to an in-memory object form, often the alias node identifiers are lost (replaced with object identity).

YAML alias nodes effectively exist only in the textual representation of a YAML resource, not the parsed version most applications are likely to be dealing with.

This may mean that for some languages a YAML parser which can output an AST representation of the YAML object tree, or some other mechanism such as an adjacency list may be required.

See #71 for concerns over different subtypes of application/yaml having different fragment resolution rules.

ioggstream commented 1 year ago

Hi @MikeRalphson!

We had a long discussion on this topic. See #47 Shortly:

the alias node identifiers are lost (replaced with object identity).

you mean anchor names are not preserved, right? That's right, and this is highlighted here Nonetheless, for the YAML community/developers anchor names are the primary way to reference nodes.

This may mean that for some languages a YAML parser which can output an AST representation of the YAML object tree, or some other mechanism such as an adjacency list may be required.

Can you please clarify with a little example?

Let us know, R.

MikeRalphson commented 1 year ago

Hi @ioggstream thanks for the welcome.

you mean anchor names are not preserved, right?

Correct.

Can you please clarify with a little example?

This is getting a bit implementation-specific but using @eemeli's yaml package, the best I can do is:

function parseWithAliases(str) {
  const aliases = new Map();
  const ast = yaml.parseDocument(str);
  const walker = new AList(ast); // fast recurse through every object in the AST
  for (let [value,metadata] of walker) {
    if (yaml.isAlias(value)) {
      aliases.set(value.source,value.resolve(ast));
    }
  }
  return { data: ast.toJS(), aliases };
}

which, given the following input:

hello: &a
  world:
    message: Hello, world
outputs:
  *a

gives the following output:

{
  hello: { world: { message: 'Hello, world' } },
  outputs: { world: { message: 'Hello, world' } }
}

where hello and outputs have object identity, and the alias metadata looks like this:

Map(1) {
  'a' => YAMLMap {
    items: [
      Pair {
        key: Scalar {
          value: 'world',
          range: [ 13, 18, 18 ],
          source: 'world',
          type: 'PLAIN'
        },
        value: YAMLMap {
          items: [
            Pair {
              key: Scalar {
                value: 'message',
                range: [ 24, 31, 31 ],
                source: 'message',
                type: 'PLAIN'
              },
              value: Scalar {
                value: 'Hello, world',
                range: [ 33, 45, 46 ],
                source: 'Hello, world',
                type: 'PLAIN'
              }
            }
          ],
          range: [ 24, 46, 46 ]
        }
      }
    ],
    range: [ 13, 46, 46 ],
    anchor: 'a'
  }
}

As you can maybe see, the problem is the metadata available about the YAML aliases still refers to objects within the AST, and doesn't share any objects with the output JS object.

If what I consider the best Javascript YAML library doesn't provide this functionality (or I can't find it) and many other YAML parsers don't even have an abstraction above the output native object, then where is the universal utility in allowing YAML aliases as fragments?

ioggstream commented 1 year ago

@MikeRalphson

wrt the code

Let me check if I understand the issue, and please (cc: @eemeli @perlpunk) correct me if I didn't get it:

  1. I parse the YAML Stream to the YAML Serialization Tree
  2. both the hello key and the output key reference the same node, labeled with the a anchor
  3. if I have to retrieve hello.yaml#*a, reading from anchor & aliases I 'd get the first occurrence of the anchor in the tree
  4. in any case, there's no guarantee that this node is JSON-serializable

Note: even using JSON Pointers on YAML (e.g. what.yaml#/what/a) could end in a non-JSON serializable node

# what.yaml
what: &a
  a: &wonderful
    world: *a

I hope I didn't miss the point though (it's midnight now in Italy, after all)

I think @eemeli / @perlpunk can provide some insight. I am happy to extend the security/interoperability considerations sections!

wrt the spec

Here I will share my 2¢ that go a bit beyond current implementations: I think we tried to create a common discussion ground between different communities (YAML, JSON, OAS, ...).

on OAS:

on YAML:

Thanks for reading this long answer!

MikeRalphson commented 1 year ago

@ioggstream thanks for providing the long answer!

there was a discussion whether +yaml media types should inherit automatically from application/yaml, and we identified a

Would love to know where this sentence was going to end up.

By guidance, do you mean fragment id resolution rules are optional or non-normative?

eemeli commented 1 year ago

My understanding of where we ended up is that for application/yaml, a fragment identifier like *foo ends up referring to a YAML anchor &foo defined somewhere within the file, but that for the +yaml suffix, no fragment identifier syntax is specified.

To me, this strikes the right balance for compatibility. If you're working with application/yaml or something based on it, you are explicitly working with YAML, which does feature e.g. anchors and aliases. If you're working with something that uses +yaml, you're working with data/configuration/content that isn't really tied to the serialisation format, but is expressible in other formats as well.

Now, as for e.g. my JS yaml library that was used as an example above, the criticism of its available APIs for resolving a part of a document is somewhat valid. With the example given above, at least the following is possible:

import { Alias, parseDocument } from 'yaml'

const doc = parseDocument(`
hello: &a
  world:
    message: Hello, world
outputs:
  *a
`)

new Alias('a').resolve(doc).toJSON()
// { world: { message: 'Hello, world' } }

To be fair, this only sort of works, as any further aliases within the resolved tree are not properly resolved, and duplicated aliases do not behave as specified for application/yaml. To fix this, I think it would be appropriate and relatively straightforward to add a .toJS(doc) method on each node to provide the fully resolved value, and an anchor/alias unique-ifier to effectively implement an API for the first-alias matching.

MikeRalphson commented 1 year ago

Thanks @eemeli! Do you want me to raise an issue on yamlto track this?

eemeli commented 1 year ago

If you could, as two separate issues? Already working on the toJS() method (the API is relatively straightforward, though the implementation is a little delicate), but providing a solution for this bit will need a separate fix: https://github.com/ietf-wg-httpapi/mediatypes/blob/0d2cef8bf8e658b2d34ac8e48c3ff999ca827305/draft-ietf-httpapi-yaml-mediatypes.md?plain=1#L153-L154

darrelmiller commented 1 year ago

@MikeRalphson Can we consider this issue as closed, based on on @eemeli rationale?

ioggstream commented 1 year ago

@MikeRalphson thanks for raising this issue, and :clap: to @eemeli for his support. I think that type of exchanges are great for interoperability. :rocket: