Closed jcoyne closed 8 years ago
Seems likely due to a change here: https://github.com/ruby-rdf/rdf-turtle/commit/c005f414a5552249389b73ad71df3e4b496de898
This is caused when I send an ASCII-8BIT encoded string to RDF::Turtle::Reader.new
. Perhaps it should just ensure the encoding is UTF-8 or raise an ArgumentError there?
Hard to see how a change in that commit would have made a difference. Both old and new parser use EBNF::LL1::Lexer
, which in turn uses EBNF::LL1::Scanner
. The scanner encodes everything to UTF8, using force_encoding(Encoding::UTF_8)
. The location of the error indicates that this data in the scan buffer wasn't in UTF8, which is odd, since it's always encoded when adding to the buffer.
Can you provide me with an example illustrating the problem?
I did see that EBNF::Scanner wasn't encoding string arguments, just IO arguments. I fixed this on the develop branch of ebnf, which I'll release soon, after code coverage is improved. I still didn't recreate the problem, though. A failing test would be useful.
Sorry @gkellogg I also looked at it for awhile but I was unable to write a succinct test for it. I only saw it "in the wild".