Open rabotyaga opened 1 year ago
I can reproduce the issue. My Ruby backtrace was different, showing the problem in read
, not initialize
. Here's the Ruby backtrace:
#<Thread:0x00000eb3af6a3a80 t.rb:8 run> terminated with exception (report_on_exception is true):
t.rb:10:in `read': buffer error (Zlib::BufError)
from t.rb:10:in `block (2 levels) in <main>'
from t.rb:9:in `loop'
from t.rb:9:in `block in <main>'
t.rb:15:in `wakeup': killed thread (ThreadError)
from t.rb:15:in `block in <main>'
from t.rb:14:in `loop'
from t.rb:14:in `<main>'
This was run under gdb with a breakpoint on raise_zlib_error
. He's the backtrace for that:
#0 raise_zlib_error (err=-5, msg=0x0) at ../../../../ext/zlib/zlib.c:323
#1 0x00000eb353370144 in zstream_run_try (value_arg=16165695284096) at ../../../../ext/zlib/zlib.c:1148
#2 0x00000eb2f4c9fc5b in rb_ensure () from /usr/local/lib/libruby32.so
#3 0x00000eb35336fc9c in zstream_run_synchronized (value_arg=16164904909080) at ../../../../ext/zlib/zlib.c:1186
#4 0x00000eb2f4c9fc5b in rb_ensure () from /usr/local/lib/libruby32.so
#5 0x00000eb353371785 in zstream_run (z=0xeb381e59500, src=0x0, len=<optimized out>, flush=2)
at ../../../../ext/zlib/zlib.c:1203
#6 gzfile_read_more (gz=0xeb381e59500, outbuf=4) at ../../../../ext/zlib/zlib.c:2838
#7 0x00000eb3533714bd in gzfile_read_all (gz=0xeb381e59500) at ../../../../ext/zlib/zlib.c:2953
#8 0x00000eb35336e106 in rb_gzreader_read (argc=0, argv=0xeb3de866f10, obj=<optimized out>)
at ../../../../ext/zlib/zlib.c:4017
#9 0x00000eb2f4ea2548 in vm_call_cfunc_with_frame () from /usr/local/lib/libruby32.so
#10 0x00000eb2f4ea4f73 in vm_sendish () from /usr/local/lib/libruby32.so
#11 0x00000eb2f4e829d1 in vm_exec_core () from /usr/local/lib/libruby32.so
#12 0x00000eb2f4e96b09 in rb_vm_exec () from /usr/local/lib/libruby32.so
#13 0x00000eb2f4eaa68c in invoke_block_from_c_bh () from /usr/local/lib/libruby32.so
#14 0x00000eb2f4ea98ef in loop_i () from /usr/local/lib/libruby32.so
#15 0x00000eb2f4c9f82a in rb_vrescue2 () from /usr/local/lib/libruby32.so
#16 0x00000eb2f4c9f69e in rb_rescue2 () from /usr/local/lib/libruby32.so
#17 0x00000eb2f4ea2548 in vm_call_cfunc_with_frame () from /usr/local/lib/libruby32.so
#18 0x00000eb2f4ea4f73 in vm_sendish () from /usr/local/lib/libruby32.so
#19 0x00000eb2f4e829d1 in vm_exec_core () from /usr/local/lib/libruby32.so
#20 0x00000eb2f4e96b09 in rb_vm_exec () from /usr/local/lib/libruby32.so
#21 0x00000eb2f4e94364 in rb_vm_invoke_proc () from /usr/local/lib/libruby32.so
#22 0x00000eb2f4e439c4 in thread_do_start_proc () from /usr/local/lib/libruby32.so
#23 0x00000eb2f4e43109 in thread_start_func_2 () from /usr/local/lib/libruby32.so
#24 0x00000eb2f4e427fc in thread_start_func_1 () from /usr/local/lib/libruby32.so
#25 0x00000eb2f3208755 in _rthread_start (v=<optimized out>) at /usr/src/lib/librthread/rthread.c:96
#26 0x00000eb3323d781a in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:86
The failure in zstream_run_try
is after this code:
err = (int)(VALUE)rb_thread_call_without_gvl(zstream_run_func, (void *)args,
zstream_unblock_func, (void *)args);
Looking at zstream_unblock_func
, it has the comment:
* There is no safe way to interrupt z->run->func().
Based on the comment, my guess is that there is no safe way to interrupt zstream inflation/deflation, and Thread#wakeup causes an interrupt, so it is not possible to support what you want. At best, we could document that it is not supported. However, I'm not a zlib expert, so it's possible I'm misunderstanding things. Hopefully someone with more experience in this area could confirm or correct my understanding.
Thank you very much for looking into this!
My Ruby backtrace was different, showing the problem in read, not initialize
Sometimes it's read
, sometimes initialize
.
Based on the comment, my guess is that there is no safe way to interrupt zstream inflation/deflation
An interesting thing, then, would be
The error doesn't happen, however, if we change Zlib::GzipReader.new(StringIO.new(gzipped)).read to Zlib.gunzip(gzipped), but still happens with Zlib.gzip(content).
I got this error.
/* retry if no exception is thrown */
if (err == Z_OK && args->interrupt) {
args->interrupt = 0;
goto loop;
}
MRI calls zstream_unblock_func
and set args->interrupt
. It means that zstream_run_func
is interrupted (and will be canceled). But sometimes it is okay to ignore the interrupts. The above code retries zstream_run_func
.
HOWEVER, it is possible to complete the task (e.g. deflate) before cancelling and there is no data to deflate
(for example). This is why BufError
was raised (no data).
So the above retrying code should be:
/* retry if no exception is thrown */
if (err == Z_OK && args->interrupt && not_completed(z)) {
args->interrupt = 0;
goto loop;
}
I'm not sure how to implement not_completed
so I only leave this memo.
BufError
r sporadically get in prod but no such luck. The patch is still worthwhile since it does seem to fix certain interrupt failures.
Minimal reproducible script:
leads to
The error doesn't happen, however, if we change
Zlib::GzipReader.new(StringIO.new(gzipped)).read
toZlib.gunzip(gzipped)
, but still happens withZlib.gzip(content)
.Probably related to https://github.com/ruby/zlib/issues/49