There is a critical bug in RPCZ which causes a crash in certain scenarios. Given that it is intermittent it' hard for me provide a reprosteps, but the problem is quite straightforward. To explain, I need three pieces of code.
Assume that the threads were scheduled so that the sequence was as follows:
channel_->call_method(..., &rpc, ...);
responsecontext.rpc->set_status(status::OK);
rpc.wait();
responsecontext.rpc->syncevent->signal();
Now, the last piece of code --- rpc.wait() method:
int rpc::wait() {
status_code status = get_status();
CHECK_NE(status, status::INACTIVE) << "Request must be sent before calling wait()";
if (status != status::ACTIVE) return get_status(); // THIS LINE SHOULD BE REMOVED
syncevent->wait();
return 0;
}
If threads was scheduled as I described, the status of rpc will be OK, and the rpc::wait() will exit immediately. Therefore, by the moment when handle_client_response() reaches the syncevent->signal() the sync_event might be disposed, and we get a crash.
I believe the fix is just to remove the following line in rpc::wait() method:
if (status != status::ACTIVE) return get_status();
There is a critical bug in RPCZ which causes a crash in certain scenarios. Given that it is intermittent it' hard for me provide a reprosteps, but the problem is quite straightforward. To explain, I need three pieces of code.
// Any stub method, generated by RPCZ compiler. void XXXStub::XXXMethod(...) { ::rpcz::rpc rpc; channel_->call_method(..., &rpc, ...); // (1) rpc.wait(); // (3) }
// File rpcz\rpc_channel_impl.cc void rpc_channel_impl::handle_client_response(...) { responsecontext.rpc->set_status(status::OK); // (2) responsecontext.rpc->syncevent->signal(); // (4) }
Assume that the threads were scheduled so that the sequence was as follows:
Now, the last piece of code --- rpc.wait() method: int rpc::wait() { status_code status = get_status(); CHECK_NE(status, status::INACTIVE) << "Request must be sent before calling wait()"; if (status != status::ACTIVE) return get_status(); // THIS LINE SHOULD BE REMOVED syncevent->wait(); return 0; }
If threads was scheduled as I described, the status of rpc will be OK, and the rpc::wait() will exit immediately. Therefore, by the moment when handle_client_response() reaches the syncevent->signal() the sync_event might be disposed, and we get a crash.
I believe the fix is just to remove the following line in rpc::wait() method: if (status != status::ACTIVE) return get_status();