Closed FranklinChen closed 11 years ago
You will need to tell us more. What does this even do? What should it do? What are you expecting? How did you get there? What version of Ruby are you using? What version of parslet are you using?
This is part of a parser I wrote for email addresses. I thought I'd give just the part that fails, whose cause is how Regexp.new
is called in https://github.com/kschiess/parslet/blob/master/lib/parslet/atoms/re.rb
. I expect the construct to be legal when the parser is instantiated and used, e.g,. EmailValidator::FancyParser.new.email.parse("franklinchen@franklinchen.com")
should not throw RegexpError
.
Version of parslet: 1.5.0 Version of Ruby: MRI ruby-2.0.0-p195
# A fancy email address parser, based on
# http://davidcel.is/blog/2012/09/06/stop-validating-email-addresses-with-regex/
class EmailValidator::FancyParser < Parslet::Parser
rule(:qtext) { match['^\x0d\"\\\x80-\xff'] }
rule(:dtext) { match['^\x0d\[\\\]\x80-\xff'] }
rule(:atom) { match['^\x00- \"\(\)\,\.\:\;\<\>\@\[\\\]\x7f-\xff'].repeat(1) }
rule(:quoted_pair) { str('\\') >> match['\x00-\x7f'] }
rule(:domain_literal) { str('\[') >>
(dtext | quoted_pair).repeat >>
str('\]') }
rule(:quoted_string) { str('\"') >>
(qtext | quoted_pair).repeat >>
str('\"') }
rule(:domain_ref) { atom }
rule(:sub_domain) { domain_ref | domain_literal }
rule(:word) { atom | quoted_string }
rule(:domain) { sub_domain >> (str('\.') >> sub_domain).repeat }
rule(:local_part) { word >> (str('\.') >> word).repeat }
rule(:email) { local_part >> str('@') >> domain }
end
Can you try this instead for the :atom
line? rule(:atom) { match[%Q{^\x00- \"\(\),.:;<>@\\[\\\]\x7f-\xff}].repeat(1) }
.
And you don't need to escape the .
in the str
calls (neither for "
), since that is not using regexps.
DIsclaimer: I'm running this on 1.9.3 – it might be that in Ruby 2.0.0 the Regexp needs to be created using the 'n'
directive.
Your change allows the code to run on ruby-1.9.3-p392 but still does not run on ruby-2.0.0-p195. I will switch back to 1.9.3 for now.
s = '[^\x0d\"\\\x80-\xff]'
s.force_encoding 'ASCII-8BIT'
Parslet.match(s)
A good solution to this problem would be welcome; either as patch or as a textual description. I am not sure I can even state the problem clearly yet.. :(
Unsure whether this is even a parslet issue.
I am closing this for lack of feedback - this looks like an issue of ruby strings and encodings as much as it might be a parslet issue.
I wrote a parser with
match['^\x0d\"\\\x80-\xff']
but that results inRegexpError: invalid multibyte escape