YAML fragment: is switching from named anchor to alias nodes seamles?

ioggstream commented 2 years ago

Question

Currently the spec supports fragment identifier based on "named anchors", e.g. foo.yaml#ant references one.

# This is foo.yaml
one: &ant one

Today alias nodes syntax is just * + named anchors,

one: &named_anchor one
alias_node: *named_anchor

but in the future alias nodes could support "pathlike" expressions, e.g. *foo/bar.

Can "named anchors" fragment identifiers be extended in the future in a backward compatible way to suppot "alias nodes" syntax?

Pathlike alias nodes are complex

Consider the following examples

/ is a valid character for a named anchor, using / as a pathlike separator might be non-trivial. Currently many YAML parser do only support [a-zA-Z0-9-_] though, so future YAML spec might restrict the syntax of named anchors

- par: &ant/ani ciao
- rap: *ant/ani

keys might contain non-string characters (e.g. 1 vs "1"). We need to know how pathlike elements are serialized, to identify a human-readable encoding (e.g. JSON Pointer)

- &foo
  bar:
    1: "integer"
   "1": "string"
  '"1"': '"1"'
- baz: 
  - *foo/bar/1
  - *foo/bar/"1"
  - *foo/bar/"\"1\""

another example. A non human-readable encoding such as base64url would solve the issue, since we could just decode the string. This could be supported when pathlike are messy, e.g. file.yaml#:Zml6ei8iYnV6ei9iYXo=:

- &fizz
  "buzz/baz": "a"
  "buzz":
    "baz": "b"
- roc: *fizz/buzz/baz
- rov: *fizz/"buzz/baz"

ioggstream commented 2 years ago

Feedback from @cabo

I think PDF’s #page=12 is better than a #12 would have been (and I would prefer YAML’s to be) #a=bar instead of #bar so application/foo+yaml can still define #x=4711 and let #a=bar stand from the general fragment identifier considerations.

Maybe we could use something like https://ciao.yaml#*alias_name

In [8]: urlparse("https://ciao.yaml#*foo/bar")
Out[8]: ParseResult(scheme='https', netloc='ciao.yaml', path='', params='', query='', fragment='*alias_name')

eemeli commented 2 years ago

The easiest solution might be to initially restrict valid characters in the fragment identifier to [a-zA-Z0-9_-]+. This would mean that while not all anchors are addressable, most would be. It would also allow relatively easy extension e.g. by making page=12 invalid as an application/yaml fragment identifier, and therefore usable by a schema that extends it.

Not sure how to express it, but it might be good to still reserve / for future use by later YAML versions?

cabo commented 2 years ago

Or you could do the inverse and just keep = for application-specific fragment identifier schemes. BTW, why don't you include JSON pointer (RFC6901) in the standard set? That is quite useful for the majority of YAML applications (certainly the JSON compatible ones).

ioggstream commented 2 years ago

WRT sub-mediatypes like openapi and jsonschema, we are discussing in https://github.com/ietf-wg-httpapi/mediatypes/issues/2

just keep = for application-specific fragment identifier schemes

jsonschema supports two fragment identifiers: JSON Pointers and Plain names. Plain names are just strings, e.g. "foo". We could ask @jdesrosiers if there is some space to tweak the usage of plain names as fragment identifiers.

include JSON pointer (RFC6901) in the standard set

I think it would be useful, but I think it's YAML community choice (aka @eemeli ). Probably the easiest way to do it is:

use JSONPointers if starts with #/
use alias names if starts with #*

This will allow further space for subtypes, eg.

use jsonpath if starts with #$

I am not sure that using unstructured strings as fragment identifiers is an interoperable choice at all.

What do you think?

eemeli commented 2 years ago

What's the problem that we're trying to solve here? If we're talking about fragment identifiers for application/yaml, I don't see why we need to consider other specs at all. If we're talking about fragment identifiers for +yaml, we currently have this: https://github.com/ietf-wg-httpapi/mediatypes/blob/4ff8647f1fda1bbfc678854623fe8e98d0c6e42a/draft-ietf-httpapi-yaml-mediatypes.md?plain=1#L240-L243

Hence I think the proposal to limit application/yaml fragment identifiers to \w- makes sense, as that'll allow targeting nearly all real-world YAML anchors while keeping future YAML-spec compatibility good. Which may later bring in "native" support for something like JSON Pointers, and which this mediatype spec should not try to pre-empt.

For +yaml cases, many formats need to be embeddable in JSON as well and include some form of internal anchor syntax. Those may of may not want to make the YAML anchors addressable by fragment identifiers as well, but they get to make that choice independently, much like they do already for +json.

ioggstream commented 2 years ago

Q1. What's the problem that we're trying to solve here?

Identify a fragment identifier syntax that is compatible with existing formats like jsonpath and json pointers

Q2. Why?

Because implementers are accustomed to use json pointers and could for example just ignore our syntax

Q3. Shouldn't implementers use JSON Pointers only when using openapi+yaml ?

Theoretically yes. In practice, since openapi+yaml is not yet registered, implementers just use application/yaml + JSON Pointers. Using * will at the same time:

enable support for alias nodes (provided that a suitable encoding is defined in future YAML version)
enable implementation to distinguish between json pointers and alias nodes just using the first character.

Q4. What happens if we don't find a workaround for JSON Pointers?

There's the possibility that implementers will anyway continue to use JSON Pointers for processing application/yaml fragment identifiers.

Q4. Is there some interest in using JSON Pointers in conjunction with application/yaml ?

While I thought there wasn't, iiuc @cabo wrote that there could be some interest for that (but please correct me if I'm wrong).

jdesrosiers commented 2 years ago

jsonschema supports two fragment identifiers: JSON Pointers and Plain names. Plain names are just strings, e.g. "foo". We could ask @jdesrosiers if there is some space to tweak the usage of plain names as fragment identifiers.

There's no reason we couldn't constrain how plain name fragments work in future dialects of JSON Schema, but we still wouldn't be able to change the media type definition because then it wouldn't support the existing dialects that don't have those constraints.

ioggstream commented 2 years ago

it wouldn't support the existing dialects

Since with #47 plain names do not overlap with application/yaml fragment syntax, application/yaml implementers could decide to process fragments where fragment.get(0, "") =~ /^[a-zA-Z]/ as plain names and still be interoperable with YAML.

This behavior won't be interoperable with other +yaml media types that redefine fragment syntax though.

Question: should we specify that supporting further syntaxes in conjunction with application/yaml could create interoperability issues with future implementations? cc: @cabo

ioggstream commented 2 years ago

Closed in #47

ietf-wg-httpapi / mediatypes

YAML fragment: is switching from named anchor to alias nodes seamles? #41

Question

Pathlike alias nodes are complex