yaml / libyaml

Canonical source repository for LibYAML
http://pyyaml.org/wiki/LibYAML
MIT License
921 stars 312 forks source link

%TAG prefix does not accept all characters in ns-uri-char production #253

Open gkellogg opened 1 year ago

gkellogg commented 1 year ago

As noted in https://github.com/yaml/yaml-spec/issues/268#issuecomment-1208565027, Psych does not accept a %TAG prefix including a #, which seems to be due to the following code:

https://github.com/yaml/libyaml/blob/f8f760f7387d2cc56a2fc7b1be313a3bf3f7f58c/src/scanner.c#L2603-L2627

According to theYAML 1.2 Spec the ns-uri-char does include #, which is missing from the scanner.

[39] ns-uri-char ::=
    (
      '%'
      [ns-hex-digit](https://yaml.org/spec/1.2.2/#rule-ns-hex-digit){2}
    )
  | [ns-word-char](https://yaml.org/spec/1.2.2/#rule-ns-word-char)
  | '#'
  | ';'
  | '/'
  | '?'
  | ':'
  | '@'
  | '&'
  | '='
  | '+'
  | '$'
  | ','
  | '_'
  | '.'
  | '!'
  | '~'
  | '*'
  | "'"
  | '('
  | ')'
  | '['
  | ']'

This prevents creating a TAG line such as the following:

%TAG ! http://www.w3.org/2001/XMLSchema#
gkellogg commented 1 year ago

As a workaround, %TAG ! http://www.w3.org/2001/XMLSchema%23 works, but is not ideal, and shouldn't be required based on the grammar.

gkellogg commented 1 year ago

The scanning issue extends to inline-tags, as well. If you parse the following

%TAG !xsd! http://www.w3.org/2001/XMLSchema%23
---
date: !xsd!date 2022-08-08

and re-serialize without the %TAG directive, you'll get the following:

date: !<http://www.w3.org/2001/XMLSchema%23date> 2022-08-08

Per the grammar, you should also be able to parse the following:

date: !<http://www.w3.org/2001/XMLSchema#date> 2022-08-08

But, it fails in a similar manner to that reported on %TAG. In this case, it is the c-verbatim-tag which includes ns-uri-char+ where the # is again excluded.

Working around this requires a pre-parsing step to replace these characters are appropriate before parsing and after serializing.

This is tested using Ruby Psych version 4.0.4, which wraps libyaml, and the issues seem to be entirely within the library.