redis-rb / redis-client

Simple low level client for Redis 6+
MIT License
124 stars 60 forks source link

Fix connection corruption when rb_thread_fd_select() raises #158

Closed XrXr closed 9 months ago

XrXr commented 9 months ago

Previously, when rb_thread_fd_select() raises an exception and longjmps away, the connection object is left with the default reader parser functions, which don't return valid Ruby objects.

In v0.19.0, this resulted in SEGVs when reading from the connection, because a redisReply was returned from the _read method. Use the following to recreate the problem:

require 'redis-client'
require 'hiredis-client'
redis = RedisClient.new
redis.disable_reconnection { }
redis.close

main = Thread.current
Thread.new { sleep ARGV.first.to_f; main.raise "ha" }
begin
  p redis.call(:GET, "asdf")
rescue => e
  p e
else
  puts "no exception delivered. Use strace or a breakpoint to expand window"
  exit
end
p redis.call(:GET, "asdf")
$ ruby test.rb 0.5
"1000"
no exception delivered. Use strace or a breakpoint to expand window

$ strace -e trace=pselect6 -e inject=pselect6:delay_enter=1000000:when=2..3 ruby test.rb 0.5
pselect6(6, NULL, [5], NULL, {tv_sec=1, tv_nsec=0}, NULL) = 1 (out [5], left {tv_sec=0, tv_nsec=999998238})
pselect6(6, [5], NULL, NULL, {tv_sec=1, tv_nsec=0}, NULL) = 1 (in [5], left {tv_sec=0, tv_nsec=999996130}) (DELAYED)
pselect6(6, NULL, [5], NULL, {tv_sec=1, tv_nsec=0}, NULL) = 1 (out [5], left {tv_sec=0, tv_nsec=999997186}) (DELAYED)
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_TKILL, si_pid=1097337, si_uid=1000} ---
pselect6(6, [5], NULL, NULL, {tv_sec=1, tv_nsec=0}, NULL) = 1 (in [5], left {tv_sec=0, tv_nsec=999998725})
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
/home/spin/.gem/ruby/3.3.0/gems/hiredis-client-0.19.0/lib/redis_client/hiredis_connection.rb:90: [BUG] Segmentation fault at 0x0000000000000000

With GH-157, the default functions no longer lead to a crash, but rather the reading with the connection always giving nil. This is due to reading from an empty stack in hiredis_read_internal() instead of directly returning the reply of redisGetReplyFromReader(). There is also a leak each time, because the default functions allocates.

Another instance of the problem can be seen passing 0 to the script, causing the exception earlier in hiredis_reconnect_nogvl() instead.

Fix by using rb_protect() and by clearing the context from the connection before hiredis_reconnect_nogvl().


The sleep(0) case doesn't need strace and could be turned into a test, I just ran out of time today.

byroot commented 9 months ago

Holy 🐮 , that is a great find. Thank you Alan!