Open RazrFalcon opened 2 years ago
I can't reproduce it locally:
$ ruby -v bin/ruby-parse --32 -E -e '"\\u{D800}"'
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
"\\u{D800}"
^~~~~~~~~~~ tSTRING "\\u{D800}" expr_end [0 <= cond] [0 <= cmdarg]
"\\u{D800}"
^ false "$eof" expr_end [0 <= cond] [0 <= cmdarg]
(str "\\u{D800}")
$ ruby -ve 'p "\\u{D800}"'
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
"\\u{D800}"
Is it related to an old version of Ruby? Could you try it on a version of Ruby that is still supported (i.e. at least 2.7)
My hunch is that old Ruby has old Unicode support that doesn't know about these codepoints.
This is the default Ruby on macos. I'm not sure if you do support it.
No, Ruby 2.7 is deprecated since 2022-04-12. We do run tests for 2.6.10
on CI, and at least this version works well. You can use rbenv/RVM or whatever is popular these days to install a newer version of Ruby.
I'm closing it, but feel free to reopen it if the error appears again for you with maintained Ruby versions (>= 2.7)
Am I still doing something wrong?
> ruby -v
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
> /opt/homebrew/lib/ruby/gems/3.1.0/gems/parser-3.1.2.0/bin/ruby-parse --32 -E -e '"\\u{D800}"'
Failed on: (fragment:0)
/opt/homebrew/lib/ruby/gems/3.1.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)
...
Same, but using current master:
> ruby -v bin/ruby-parse --32 -E -e '"\\u{D800}"'
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
Failed on: (fragment:0)
/opt/homebrew/lib/ruby/gems/3.1.0/gems/parser-3.1.2.0/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)
Sorry, bash escaping issue, I should've checked this code in a separate file. My bad.
$ /bin/cat test.rb
"\u{D800}"
$ ruby -v test.rb
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
test.rb:1: invalid Unicode codepoint
"\u{D800}"
$ ruby -v bin/ruby-parse --32 test.rb
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
Failed on: test.rb
/Users/ilyabylich/Work/parser/lib/parser/lexer.rb:17506:in `chr': invalid codepoint 0xD800 in UTF-8 (RangeError)
...
stacktrace
...
This is a bug and it should be fixed, reopening.
The error comes from this line, codepoint
is "D800".to_i(16) == 55296
and so Ruby gives an error on converting a codepoint to a character:
=> "D800".to_i(16).chr(Encoding::UTF_8)
RangeError (invalid codepoint 0xD800 in UTF-8)
I'm pretty sure we need to catch a RangeError
and emit it as a :invalid_unicode_escape
diagnostic (that's what Ruby parser does).
I'll fix it next week, thanks for reporting.
Sure, no problem. I was running it in Fish and didn't even though about shell escaping differences.
I would assume that U+D800...U+DFFF should be ignored.