Open kirs opened 3 years ago
Pack and format seem in general to possibly not understand anything but the most basic encodings.
I looked into more corner cases on MRI here.
puts RUBY_DESCRIPTION
ascii = 'ascii'.encode('us-ascii')
utf16 = 'utf16'.encode('utf-16le')
win1251 = 'win1251'.encode('windows-1251')
puts "us-ascii + win-2151 = "
puts format('%s %s', ascii, win1251).encoding.inspect
puts "utf16 = "
puts format('%s', utf16).encoding.inspect
puts "ascii + utf-16="
puts format('%s %s', ascii, utf16).encoding.inspect
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-darwin19]
us-ascii + win-2151 =
#<Encoding:UTF-8>
utf16 =
#<Encoding:UTF-16LE>
ascii + utf-16=
Traceback (most recent call last):
1: from demo.rb:16:in `<main>'
demo.rb:16:in `format': incompatible character encodings: UTF-8 and UTF-16LE (Encoding::CompatibilityError)
It does some kind of math where ascii + win-1251 results to utf8 and utf16 equals utf16, but ascii + utf-16 is not compatible.
That's defined in rb_enc_compatible (https://github.com/ruby/ruby/blob/8a4472fb6d2df0f6407cef24df6a038be90d1462/encoding.c#L1172-L1185) which returns an error or one encoding out of two that's a superset.
I'm guessing that the result string of format
should do logic similar to rb_enc_compatible
and calculate result encoding that would be compatible with all inputs.
Yes that sounds likely, but figuring out when it should do it is often tricky.
Also compare with what JRuby does - as if they're correct it can sometimes be easier to understand their code.
Related JRuby commit: https://github.com/jruby/jruby/commit/bb90d3b7644316f8ae6b92e02defdf3838854fb5
We have NegotiateCompatibleEncodingNode
(and a couple nodes using that) to find an Encoding compatible for 2 encodings.
That's use for Encoding.compatible?
in Ruby, and should be the same as rb_enc_compatible()
.
Thanks!
Do you have an idea why PrintfCompiler
is using FormatEncoding
(which only supports ASCII and UTF)? Should I change that to use common Encoding
?
I'm not sure. Yes, using the JCodings Encoding
instead would be best.
I think it was possibly done to make the format package independent of the rest of TruffleRuby.
(discovered this while pairing with @chrisseaton on https://github.com/oracle/truffleruby/pull/2308)
On MRI:
On TrufflyRuby: