Define fragment identifiers for application/yaml

eemeli commented 2 years ago

Closes #21

This defines application/yaml fragment identifiers to be parsed as YAML aliases, which currently means that they must point to an explicitly defined anchor in the document, a feature natively supported by YAML.

The definition intentionally allows for changes in later editions of the YAML spec to be automatically supported, e.g. as we're working towards supporting something like JSON pointers as well.

The language in the +yaml fragment identifier section seems a bit complex, and I'm not sure if it should be updated as well. Formats xxx/yyy+yaml should be allowed to define their own rules for fragment identifiers. Is this currently the case?

ioggstream commented 2 years ago

@eemeli Can you please add an example YAML URI with fragment identifier, so that we could find a way to provide it?

eemeli commented 2 years ago

Can you please add an example YAML URI with fragment identifier, so that we could find a way to provide it?

Struggling a bit to figure out just what you're looking for here. For an example of how this would work, let's presume that we have file.yaml with the following contents:

%YAML 1.2
---
one: &foo scalar
two:
  - some
  - sequence
  - &bar items

Then, path/to/file.yaml#foo would be pointing at the node with the value scalar, while path/to/file.yaml#bar would point to the node with the value items.

Do you want this sort of example to be included in the RFC?

ioggstream commented 2 years ago

@eemeli probably it could be useful to add either an example section or a normative section that explicits that. An alternative could be to add this information in the YAML spec. WDYT?

I imagine something brief like https://datatracker.ietf.org/doc/html/rfc6901#section-6 which includes some considerations on percent-encoding and some examples. A couple of question, for example:

yaml can be encoded in UTF-8, 16, 32. Can anchor/alias nodes identifier be non-ascii / non-utf8 encoded ?

ioggstream commented 2 years ago

The language in the +yaml fragment identifier section seems a bit complex, and I'm not sure if it should be updated as well. Formats xxx/yyy+yaml should be allowed to define their own rules for fragment identifiers. Is this currently the case?

Let's discuss this topic in the issue #21

eemeli commented 2 years ago

@eemeli probably it could be useful to add either an example section or a normative section that explicits that. An alternative could be to add this information in the YAML spec. WDYT?

The YAML spec already includes sections on Node Anchors and Alias Nodes, which then include some examples of them in use. The intent here is to defer to that spec's definition of alias nodes.

yaml can be encoded in UTF-8, 16, 32. Can anchor/alias nodes identifier be non-ascii / non-utf8 encoded ?

At the point where the YAML spec defines anchors and aliases, it's treating its input as a sequence of Unicode code points, i.e. it doesn't care about their encoding. The YAML 1.2 set of acceptable characters for these is tbh far too wide, as it allows for nearly all printable Unicode code points.

ioggstream commented 2 years ago

it's treating its input as a sequence of Unicode code points

foo: &però ciao
bar: *però

reading https://www.rfc-editor.org/rfc/rfc3986#section-3.5 iiuc I need to %encode the però string, right? In this case I am not sure how this should work with UTF-8, 16, 32... Can you make some examples?

ioggstream commented 2 years ago

## Fragment identification {#application-yaml-fragment}

This section describes how to use
named anchors (see Section 3.2.2.2 of [YAML])
as fragment identifier to designate a node.

A YAML named anchor can be represented in a URI fragment identifier
by encoding it into octects using UTF-8 {{!UTF-8==RFC3629}},
while percent-encoding those characters not allowed by the fragment rule
in {{Section 3.5 of URI}}. 

If multiple nodes would match a fragment identifier,
the first such match is selected.

Users concerned with interoperability of fragment identifiers:

- SHOULD limit named anchors to a set of characters
  that do not require encoding 
  to be expressed as URI fragment identifiers:
  this is always possible since named anchors are a serialization
  detail;
- SHOULD NOT use a named anchor that matches multiple nodes.

In the example resource below, the URL `file.yaml#foo`
references the anchor `foo` pointing to the node with value `scalar`;
whereas
the URL `file.yaml#bar` references the anchor `bar` pointing to the node
with value `[ some, sequence, items ]`.

~~~ example
%YAML 1.2
---
one: &foo scalar
two: &bar
  - some
  - sequence
  - items
~~~

ioggstream commented 2 years ago

Merging and moving discussion in #41

ietf-wg-httpapi / mediatypes

Define fragment identifiers for application/yaml #38