pg sets the encoding on PG::Connection objects

larskanis commented 6 years ago

Original report by Benoit Daloze (Bitbucket: eregon, GitHub: eregon).

pg sets the encoding with ENCODING_SET_INLINED/rb_enc_set_index on PG::Connection objects. Pg uses a macro PG_ENCODING_SET_NOCHECK() in pg.h.

This seems not supported in latest ruby trunk, as Koichi Sasada explicitly added a check against it in https://github.com/ruby/ruby/commit/d0fb73a0f0b82ebc0d3eb931c28686a2b9c40fef , and as a result the program crashes when rb_enc_set_index() is called on a non-encoding capable object.

To observe this, we need to change PG_ENCODING_SET_NOCHECK() to just call rb_enc_set_index() as ENCODING_SET_INLINED() doesn't contain the check currently, but likely has unsupported semantics and only works by luck that there are < 128 encodings.

Then with ruby-trunk we can see (the program assume a postgres db named "test" exists):

$ ruby -Ilib -rpg -e 'p PG.connect(dbname: "test")'

/home/eregon/code/ruby-pg/lib/pg.rb:56: [BUG] enc_set_index: not capable object
ruby 2.6.0dev (2018-07-16 trunk 63979) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0020 e:000019 CFUNC  :initialize
c:0004 p:---- s:0017 e:000016 CFUNC  :new
c:0003 p:0016 s:0012 e:000011 METHOD /home/eregon/code/ruby-pg/lib/pg.rb:56
c:0002 p:0013 s:0007 e:000005 EVAL   -e:1 [FINISH]
c:0001 p:0000 s:0003 E:001c50 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
-e:1:in `<main>'
/home/eregon/code/ruby-pg/lib/pg.rb:56:in `connect'
/home/eregon/code/ruby-pg/lib/pg.rb:56:in `new'
/home/eregon/code/ruby-pg/lib/pg.rb:56:in `initialize'

-- C level backtrace information -------------------------------------------
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_vm_bugreport+0xcf9) [0x55cbdfb9a1b9] vm_dump.c:704
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_bug+0xd0) [0x55cbdfb8d200] error.c:595
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_enc_set_index+0xfe) [0x55cbdfb77a4e] encoding.c:822
/home/eregon/code/ruby-pg/lib/pg_ext.so(pgconn_init+0x168) [0x7ff1d355d2c8]
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_call0_body.constprop.314+0x288) [0x55cbdfaf0648] vm_eval.c:87
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_call0+0xdf) [0x55cbdfaf168f] vm_eval.c:60
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_funcallv+0x2b) [0x55cbdfaf1d7b] vm_eval.c:595
/home/eregon/prefix/ruby-trunk/bin/ruby(rb_class_s_new+0x21) [0x55cbdf9e9ad1] object.c:2153
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_call_cfunc+0xfe) [0x55cbdfaea6ce] vm_insnhelper.c:1934
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_call_method+0xe3) [0x55cbdfaedae3] vm_insnhelper.c:2424
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_exec_core+0x13e) [0x55cbdfaf511e] /home/eregon/code/ruby/insns.def:779
/home/eregon/prefix/ruby-trunk/bin/ruby(vm_exec+0x96) [0x55cbdfaec266] vm.c:1807
/home/eregon/prefix/ruby-trunk/bin/ruby(ruby_exec_internal+0xc4) [0x55cbdf967c84] eval.c:261
/home/eregon/prefix/ruby-trunk/bin/ruby(ruby_run_node+0x2f) [0x55cbdf96c1ef] eval.c:325
/home/eregon/prefix/ruby-trunk/bin/ruby(main+0x5f) [0x55cbdf9670df] ./include/ruby/intern.h:295

I think one solution is to store the encoding in a separate field, for instance in t_pg_connection. If you think Ruby should keep supporting rb_enc_set_index() on non-encoding-capable objects, please file an issue at https://bugs.ruby-lang.org/

I'm just reporting this as I noticed the change by Koichi and knew from trying to run pg with TruffleRuby that pg uses rb_enc_set_index() on the PG::Connection object.

larskanis commented 6 years ago

Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).

Thank you @eregon for notifying us and for your analysis!

Storing the encoding in the metadata of a T_DATA object was a handy trick to save some space in the connection and the result. However if it's in the way of some future improvements in MRI, I think we can find an alternative solution for it. I'll check this and will make ruby-pg compatible with ruby-trunk.

Regarding TruffleRuby, please let me know when it is ready to be supported by ruby-pg. We already had some Rubinius compatibility in the past for parts that are too MRI specific. -- Kind Regards, Lars

larskanis commented 5 years ago

Original comment by Chris Bandy (Bitbucket: cbandy, GitHub: cbandy).

Is this still an issue? I am able to ruby -rpg -e 'p PG.connect(ENV["DATABASE_URL"])' on MRI ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux].

larskanis commented 5 years ago

Original comment by Benoit Daloze (Bitbucket: eregon, GitHub: eregon).

Yes. The fix should be:

Use rb_enc_set_index() instead of PG_ENCODING_SET_NOCHECK, which is intrusive and bypasses sanity checks.
Then we should notice invalid cases where the encoding is not set on a encoding-capable object (only String/Regexp/IO/Symbol), as the backtrace above.

larskanis / importtest

pg sets the encoding on PG::Connection objects #280