ruby-rdf / rdf-turtle

Turtle reader/writer for Ruby
http://rubygems.org/gems/rdf-turtle
The Unlicense
31 stars 9 forks source link

Encoding error when upgrading from 1.1.7 to 1.1.8 #13

Closed jcoyne closed 8 years ago

jcoyne commented 9 years ago
    Encoding::CompatibilityError:
       incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ebnf-0.3.9/lib/ebnf/ll1/scanner.rb:96:in `scan'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ebnf-0.3.9/lib/ebnf/ll1/scanner.rb:96:in `scan'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ebnf-0.3.9/lib/ebnf/ll1/lexer.rb:264:in `block in match_token'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ebnf-0.3.9/lib/ebnf/ll1/lexer.rb:262:in `each'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ebnf-0.3.9/lib/ebnf/ll1/lexer.rb:262:in `match_token'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ebnf-0.3.9/lib/ebnf/ll1/lexer.rb:224:in `recover'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/rdf-turtle-1.1.8/lib/rdf/turtle/reader.rb:519:in `rescue in prod'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/rdf-turtle-1.1.8/lib/rdf/turtle/reader.rb:557:in `prod'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/rdf-turtle-1.1.8/lib/rdf/turtle/reader.rb:258:in `read_statement'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/rdf-turtle-1.1.8/lib/rdf/turtle/reader.rb:145:in `each_statement'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ldp-0.4.0/lib/ldp/resource/rdf_source.rb:59:in `response_as_graph'
     # /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ldp-0.4.0/lib/ldp/resource/rdf_source.rb:31:in `graph'
jcoyne commented 9 years ago

Seems likely due to a change here: https://github.com/ruby-rdf/rdf-turtle/commit/c005f414a5552249389b73ad71df3e4b496de898

jcoyne commented 9 years ago

This is caused when I send an ASCII-8BIT encoded string to RDF::Turtle::Reader.new. Perhaps it should just ensure the encoding is UTF-8 or raise an ArgumentError there?

gkellogg commented 8 years ago

Hard to see how a change in that commit would have made a difference. Both old and new parser use EBNF::LL1::Lexer, which in turn uses EBNF::LL1::Scanner. The scanner encodes everything to UTF8, using force_encoding(Encoding::UTF_8). The location of the error indicates that this data in the scan buffer wasn't in UTF8, which is odd, since it's always encoded when adding to the buffer.

Can you provide me with an example illustrating the problem?

gkellogg commented 8 years ago

I did see that EBNF::Scanner wasn't encoding string arguments, just IO arguments. I fixed this on the develop branch of ebnf, which I'll release soon, after code coverage is improved. I still didn't recreate the problem, though. A failing test would be useful.

jcoyne commented 8 years ago

Sorry @gkellogg I also looked at it for awhile but I was unable to write a succinct test for it. I only saw it "in the wild".