Closed GoogleCodeExporter closed 9 years ago
I don't know what should be there. But my guess is "Unicode Character 'DOUBLE
LOW-9 QUOTATION MARK' (U+201E)".
Parser sees 2 commas and interprets it as FlowEntry I think. It is
intentionally 2 commas and not \u201e ?
I ask because there is "Unicode Character 'DOUBLE HIGH-REVERSED-9 QUOTATION
MARK' (U+201F)" on line #200 represented as 2 `
So I assume it is really 2 commas there. Maybe using ",," would fix parsing.
Original comment by alexande...@gmail.com
on 18 Jan 2011 at 8:49
Indeed 2 characters in line 183 have code 0x2C which is just a comma in UTF-8.
(since no BOM is used UTF-8 is assumed)
Original comment by py4fun@gmail.com
on 18 Jan 2011 at 9:09
Very interesting! So are you saying that the file contains a non-UTF-8 sequence
in U+201F, and as a result the parser is treating it as a different type of
entry? And that the bug is actually in the source YAML? That conslusion is
acceptable to me, but the discrepancy is that libyaml appears to accept this
document...
I'm happy to take the issue back to the RedCloth maintainers, but I'd like to
know why it fails for JRuby+SnakeYAML and not for Ruby+libyaml...
Original comment by headius%...@gtempaccount.com
on 18 Jan 2011 at 10:01
No. It does not. see Comment #1
What I think is that SnakeYAML treats ,, (2 commas) as FlowEntry not as a
string containing 2 commas. And putting ",," instead of just ,, (2 commas) in
the source YAML may fix the problem.
please, correct me if I am wrong.
Original comment by alexande...@gmail.com
on 18 Jan 2011 at 10:42
I did not quite catch comment #3. Comma is a normal UTF-8 character. But it
indicates a flow context in YAML. It can be escaped with double quotes. But I
am not sure this is what you expect.
Can you may be provide a short file which fails ? A few bytes is easier to test
then 50k.
You can also try to check the document validity with PyYAML:
http://instantyaml.appspot.com/
Original comment by py4fun@gmail.com
on 18 Jan 2011 at 11:07
instantyaml does appear to reject this file. The failure then may be expected
for strict parsing. I will look into it.
Original comment by head...@gmail.com
on 20 Jan 2011 at 3:37
The libyaml version of Ruby's YAML parser also kicks this file out:
~/projects/ruby/ext/psych ➔ ruby1.9 -I. -Ilib -rpsych -ryaml -e
"YAML.parse(File.read('/Users/headius/Downloads/latex_entities.yml'))"
/Users/headius/projects/ruby/ext/psych/lib/psych.rb:148:in `parse': couldn't
parse YAML at line 182 column 9 (Psych::SyntaxError)
from /Users/headius/projects/ruby/ext/psych/lib/psych.rb:148:in `parse_stream'
from /Users/headius/projects/ruby/ext/psych/lib/psych.rb:119:in `parse'
from -e:1:in `<main>'
I'm close to saying this is not a bug.
Original comment by head...@gmail.com
on 20 Jan 2011 at 9:12
Off topic: it looks like the Psych parser counts lines starting from 0. (it
says 182 instead of 183)
Hopefully when Mark is implemented in Psych we can see the same error message.
Shall I close the issue ?
Original comment by aso...@gmail.com
on 20 Jan 2011 at 11:33
Yes, close the issue for now.
I will also file a bug against Psych for the line being off by one.
Original comment by head...@gmail.com
on 21 Jan 2011 at 12:00
Original comment by aso...@gmail.com
on 21 Jan 2011 at 5:07
FYI, I filed an issue with Ruby here:
http://redmine.ruby-lang.org/issues/show/4301
And with RedCar, the source of the bad YAML, here:
https://redcar.lighthouseapp.com/projects/25090/tickets/464-redcar-ships-a-yaml-
file-that-does-not-parse-with-libyaml-or-19s-wrapper-psych
Original comment by head...@gmail.com
on 23 Jan 2011 at 9:10
Original issue reported on code.google.com by
headius%...@gtempaccount.com
on 18 Jan 2011 at 7:50Attachments: