ruby / prism

Prism Ruby parser
https://ruby.github.io/prism/
MIT License
808 stars 136 forks source link

`Parser::Translator` is accepting certain regexp flags where `parser` would raise #2957

Closed Earlopain closed 1 month ago

Earlopain commented 1 month ago

With plain parser, the following raises an error:

RuboCop::AST::ProcessedSource.new('/あ/n', 3.3)
# => 'String#encode': U+3042 from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)

Prism translation seems to ignore the n flag (but returns no ast):

RuboCop::AST::ProcessedSource.new('/あ/n', 3.3, parser_engine: :parser_prism)
#<RuboCop::AST::ProcessedSource:0x00007619a8cf6560
 @ast=nil,
 @buffer=#<Parser::Source::Buffer (string)>,
 @comments=[],
 @diagnostics=
  [#<Prism::Translation::Parser::PrismDiagnostic:0x00007619a9ae4668
    @arguments={},
    @highlights=[],
    @level=:error,
    @location=#<Parser::Source::Range (string) 4...4>,
    @message="regexp encoding option 'n' differs from source encoding 'UTF-8'",
    @reason=:regexp_encoding_option_mismatch>,
   #<Prism::Translation::Parser::PrismDiagnostic:0x00007619a9ae4618
    @arguments={},
    @highlights=[],
    @level=:error,
    @location=#<Parser::Source::Range (string) 4...4>,
    @message="/.../n has a non escaped non ASCII character in non ASCII-8BIT script: /あ/",
    @reason=:regexp_non_escaped_mbc>],
 @parser_engine=:parser_prism,
 @parser_error=nil,
 @path=nil,
 @raw_source="/あ/n",
 @ruby_version=3.3,
 @tokens=
  [#<RuboCop::AST::Token:0x00007619a8ea8ea8 @pos=#<Parser::Source::Range (string) 0...1>, @text="/", @type=:tREGEXP_BEG>,
   #<RuboCop::AST::Token:0x00007619a8ea8e80 @pos=#<Parser::Source::Range (string) 1...2>, @text="あ", @type=:tSTRING_CONTENT>,
   #<RuboCop::AST::Token:0x00007619a8ea8e58 @pos=#<Parser::Source::Range (string) 2...3>, @text="/", @type=:tSTRING_END>,
   #<RuboCop::AST::Token:0x00007619a8ea8e30 @pos=#<Parser::Source::Range (string) 3...4>, @text="n", @type=:tREGEXP_OPT>]>

There's an open issue in rubocop-ast for this to not raise during parsing (https://github.com/rubocop/rubocop-ast/pull/305) but still a behaviour difference.

parser has the following code to construct a regexp. Maybe it just needs to be emulated? https://github.com/whitequark/parser/blob/570e06520b81a107948d10fadaea89bd612b9a8d/lib/parser/builders/default.rb#L2249-L2267

kddnewton commented 1 month ago

This seems very odd that you would explicitly want an encoding error, as opposed to going through the normal diagnostics flow. @koic is this desired behavior here?

Earlopain commented 1 month ago

On second thought, you are right. I should have reported this to the parser gem instead, emulating this behaviour doesn't make much sense.