Closed quicksketch closed 10 years ago
Hey @quicksketch, this looks good. It seems like there is some deviation in almost every YAML implementation, but I haven't found a clear answer to the correct behaviour.
Spyc now loads this:
php > var_dump(Spyc::YAMLLoad('- "string with literal \n"'));
array(1) {
[0]=>
string(21) "string with literal
"
}
php > var_dump(Spyc::YAMLLoad('- "string with literal \\n"'));
array(1) {
[0]=>
string(21) "string with literal
"
}
Ruby's YAML parser does pretty much the same thing:
irb(main):002:0> YAML.load '- "string with literal \\n"'
=> ["string with literal \n"]
irb(main):003:0> YAML.load '- "string with literal \n"'
=> ["string with literal \n"]
PyYAML gets a little funky, but its using the libyaml
C library:
>>> yaml.load('- "string with literal \n"')
['string with literal ']
>>> yaml.load('- "string with literal \\n"')
['string with literal \n']
The YAML spec does mention that two backslashes \\
represent a literal backslash. So I'm confused on which behaviour is right.
I think what we have in Spyc is actually fine now with this pull - I am wondering what anyone else's thoughts are on what the most correct behaviour is.
Thanks for both the merge @mustangostang and the comments @ryanuber. I'm pretty sure we're treating new lines correctly now, but I think @ryanuber is right that backslashes are not interpreted correctly.
From looking at the spec, my understanding is that escaping with backslashes is only needed when inside quotes. If unquoted, a single backslash does not need to be escaped. I think this summary covers it very explicitly: http://yaml.org/spec/1.0/#id2566934
So in short, we have a new issue with backslashes. The first of @ryanuber's examples is correct, but the second one is not. From the second example, this YAML:
- "string with literal \\n"'
Is equal to this YAML:
- string with literal \n
Both should be parsed as:
array(
0 => 'string with literal \n',
)
Well, we need a way to have to have \n
characters in a quoted string, so \\n
actually seems fine. I'll draft some support for that.
Does YAML specification say something about \r
and \t
, or is \n
the only case we should support?
Does YAML specification say something about \r and \t, or is \n the only case we should support?
There's a whole bucket of special characters listed here: http://yaml.org/spec/1.0/#id2568446. So it includes not only \r, \t, and \n but also \a, \b, \e, \f... there are 20 total. I'm not sure if PHP interprets all of those characters when inside of double quotes, but it's Unicode standard, so I would assume it would do a good enough job.
For all of these characters, they should be able to be escaped with double-backslashes in quoted text, or printed as-is for unquoted text:
- "string with new lines \\r\\n and \\t tabs"
- string with new lines \r\n and \t tabs
Should both be parsed as:
array(
0 => 'string with new lines \r\n and \t tabs',
)
Fixes https://github.com/mustangostang/spyc/issues/27, quoted
"\n"
characters should be converted to new lines when parsing. Likewise, unquoted\n
characters should not be converted to new lines.