jeremypepper / snakeyaml

Automatically exported from code.google.com/p/snakeyaml
Apache License 2.0
0 stars 0 forks source link

Problem parsing YAML from RedCar ruby gem #105

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm attaching the YAML. I think it should fail to parse with SnakeYAML in any 
form, but I have tested it via JRuby's wrapper around Snake.

The error that results follow:

<pre>
~/projects/jruby ➔ jruby --1.9 -ryaml -e "YAML.load File.read 
'lib/ruby/gems/1.8/gems/RedCloth-4.2.2-universal-java/lib/redcloth/formatters/la
tex_entities.yml'"
while parsing a block node
expected the node content, but found FlowEntry
 in "<reader>", line 183, column 9:
    ldquor: ,,
            ^

    at org.yaml.snakeyaml.parser.ParserImpl.parseNode(ParserImpl.java:486)
    at org.yaml.snakeyaml.parser.ParserImpl.parseBlockNodeOrIndentlessSequence(ParserImpl.java:374)
    at org.yaml.snakeyaml.parser.ParserImpl.access$2300(ParserImpl.java:119)
    at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingValue.produce(ParserImpl.java:594)
    at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:163)
    at org.yaml.snakeyaml.parser.ParserImpl.getEvent(ParserImpl.java:173)
    at org.jruby.ext.psych.PsychParser.parse(PsychParser.java:107)
</pre>

Original issue reported on code.google.com by headius%...@gtempaccount.com on 18 Jan 2011 at 7:50

Attachments:

GoogleCodeExporter commented 9 years ago
I don't know what should be there. But my guess is "Unicode Character 'DOUBLE 
LOW-9 QUOTATION MARK' (U+201E)".

Parser sees 2 commas and interprets it as FlowEntry I think. It is 
intentionally 2 commas and not \u201e ?

I ask because there is "Unicode Character 'DOUBLE HIGH-REVERSED-9 QUOTATION 
MARK' (U+201F)" on line #200 represented as 2 `
So I assume it is really 2 commas there. Maybe using ",," would fix parsing.

Original comment by alexande...@gmail.com on 18 Jan 2011 at 8:49

GoogleCodeExporter commented 9 years ago
Indeed 2 characters in line 183 have code 0x2C which is just a comma in UTF-8.
(since no BOM is used UTF-8 is assumed)

Original comment by py4fun@gmail.com on 18 Jan 2011 at 9:09

GoogleCodeExporter commented 9 years ago
Very interesting! So are you saying that the file contains a non-UTF-8 sequence 
in U+201F, and as a result the parser is treating it as a different type of 
entry? And that the bug is actually in the source YAML? That conslusion is 
acceptable to me, but the discrepancy is that libyaml appears to accept this 
document...

I'm happy to take the issue back to the RedCloth maintainers, but I'd like to 
know why it fails for JRuby+SnakeYAML and not for Ruby+libyaml...

Original comment by headius%...@gtempaccount.com on 18 Jan 2011 at 10:01

GoogleCodeExporter commented 9 years ago
No. It does not. see Comment #1

What I think is that SnakeYAML treats ,, (2 commas) as FlowEntry not as a 
string containing 2 commas. And putting ",," instead of just ,, (2 commas) in 
the source YAML may fix the problem.

please, correct me if I am wrong.

Original comment by alexande...@gmail.com on 18 Jan 2011 at 10:42

GoogleCodeExporter commented 9 years ago
I did not quite catch comment #3. Comma is a normal UTF-8 character. But it 
indicates a flow context in YAML. It can be escaped with double quotes. But I 
am not sure this is what you expect.
Can you may be provide a short file which fails ? A few bytes is easier to test 
then 50k. 
You can also try to check the document validity with PyYAML:
http://instantyaml.appspot.com/

Original comment by py4fun@gmail.com on 18 Jan 2011 at 11:07

GoogleCodeExporter commented 9 years ago
instantyaml does appear to reject this file. The failure then may be expected 
for strict parsing. I will look into it.

Original comment by head...@gmail.com on 20 Jan 2011 at 3:37

GoogleCodeExporter commented 9 years ago
The libyaml version of Ruby's YAML parser also kicks this file out:

~/projects/ruby/ext/psych ➔ ruby1.9 -I. -Ilib -rpsych -ryaml -e 
"YAML.parse(File.read('/Users/headius/Downloads/latex_entities.yml'))"
/Users/headius/projects/ruby/ext/psych/lib/psych.rb:148:in `parse': couldn't 
parse YAML at line 182 column 9 (Psych::SyntaxError)
    from /Users/headius/projects/ruby/ext/psych/lib/psych.rb:148:in `parse_stream'
    from /Users/headius/projects/ruby/ext/psych/lib/psych.rb:119:in `parse'
    from -e:1:in `<main>'

I'm close to saying this is not a bug.

Original comment by head...@gmail.com on 20 Jan 2011 at 9:12

GoogleCodeExporter commented 9 years ago
Off topic: it looks like the Psych parser counts lines starting from 0. (it 
says 182 instead of 183)
Hopefully when Mark is implemented in Psych we can see the same error message.

Shall I close the issue ?

Original comment by aso...@gmail.com on 20 Jan 2011 at 11:33

GoogleCodeExporter commented 9 years ago
Yes, close the issue for now.

I will also file a bug against Psych for the line being off by one.

Original comment by head...@gmail.com on 21 Jan 2011 at 12:00

GoogleCodeExporter commented 9 years ago

Original comment by aso...@gmail.com on 21 Jan 2011 at 5:07

GoogleCodeExporter commented 9 years ago
FYI, I filed an issue with Ruby here:

http://redmine.ruby-lang.org/issues/show/4301

And with RedCar, the source of the bad YAML, here:

https://redcar.lighthouseapp.com/projects/25090/tickets/464-redcar-ships-a-yaml-
file-that-does-not-parse-with-libyaml-or-19s-wrapper-psych

Original comment by head...@gmail.com on 23 Jan 2011 at 9:10