oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
3k stars 183 forks source link

Ripper: incompatibility for uppercase UTF-8 constant names in aliases #3457

Open noahgibbs opened 6 months ago

noahgibbs commented 6 months ago

In aliases and many other cases, CRuby's Ripper emits different lexer tokens depending on the symbol's name. For instance an uppercase letter emits :@const instead of :@ident.

TruffleRuby does this correctly for 7-bit constants like "A", but not for unicode uppercase constants like "Ñ".

CRuby:

irb(main):001:0> require "ripper"
=> false
irb(main):002:0> Ripper.sexp_raw("alias :foo :Ñ")
=>
[:program,
 [:stmts_add,
  [:stmts_new],
  [:alias, [:symbol_literal, [:symbol, [:@ident, "foo", [1, 7]]]], [:symbol_literal, [:symbol, [:@const, "Ñ", [1, 12]]]]]]]

TruffleRuby:

irb(main):001:0> require "ripper"
=> false
irb(main):002:0> Ripper.sexp_raw("alias :foo :Ñ")
=>
[:program,
 [:stmts_add,
  [:stmts_new],
  [:alias, [:symbol_literal, [:symbol, [:@ident, "foo", [1, 7]]]], [:symbol_literal, [:symbol, [:@ident, "Ñ", [1, 12]]]]]]]
eregon commented 6 months ago

Thanks for the report. We use the same C code as CRuby for Ripper. So this is probably a bug of id_type/rb_str_symname_type/rb_enc_symname_type or sym_type or so.

Possibly related to #3407 which is also about identifier types, but probably not because rb_enc_symname_type seems implemented in C (code from CRuby).

In general the Ripper C extension uses way too many internals and is quite slow with tons of upcalls, so we'd like to get rid of it and replace it by Prism::RipperCompat :) I think it's best to not use Ripper on TruffleRuby in the Prism test suite, if there is a difference with CRuby it's almost surely a bug and we'd want the same behavior as CRuby for Prism::RipperCompat.