Open ovanes opened 7 years ago
I am having the same issue. Any work around?
What I've found out, and that flow wasn't documented, is that one needs to unbind the endpoints from the socket first.
I ended up writing a socket_wrapper
class having the following functionality:
struct socket_wrapper : boost::noncopyable
{
void unbind_all() noexcept
{
for(auto const& endpoint : bound_endpoints_)
{
try
{
socket_.unbind(endpoint);
}
catch(...)
{
// seems like ZMQ can't unbind from all bound endpoints but only from the first or last one
WARNING_AC
<< "ZeroMQ seems to have a bug. It can't unbind an endpoint"
" from the socket which it previously bound to: '"
<< endpoint << "'"
;
}
}
bound_endpoints_.clear();
}
void close()noexcept
{
auto handle = native_socket();
if(!handle) return; // already closed
TRACE_AC << "ZMQ Shutdown setting socketopt ZMQ_LINGER=0";
int linger_value = 0;
if(0!=zmq_setsockopt( handle
, ZMQ_LINGER
, &linger_value
, sizeof(linger_value)
)
)
WARNING_AC << "ZMQ Shutdown failed to set ZMQ_LINGER=0 option "
"for the socket."
;
unbind_all();
try
{
TRACE_AC << "calling ZMQ socket_t::close()";
socket_.close();
}
catch(...)
{
ERROR_AC << "ZMQ socket close failed, might be a bug in ZMQ";
}
}
private:
zmq::socket_t socket_;
std::vector<std::string> bound_endpoints_;
}
And finally to close everything including context I call the shutdown function (it should give you the idea of the flow...):
// from a class which manages the context and all the sockets...
void shutdown()
{
if(interrupted_) return;
interrupted_.store(true, std::memory_order_acquire);
for(auto& socket_wrapper : sockets_)
socket_wrapper.close();
auto native_context = static_cast<void*>(context);
if(0!=zmq_ctx_set(native_context, ZMQ_BLOCKY, 0))
{
// Failing assertion as ZMQ_BLOCKY does not seem to be supported
// assert(0==result && "unable to set the ZMQ_BLOCKY to false");
WARNING_AC
<< "ZMQ Shutdown: ZMQ_BLOCKY was not set, seems to be a bug in ZMQ"
;
}
TRACE_AC << "closing the context";
context_ptr_->close();
}
Please provide a minimal example that reproduces the problem.
Your assumption that it is necessary to unbind sockets is not correct. However, you need to close all sockets. Context termination will block until all sockets have been closed.
@ovanes could you confirm if you still see this issue on osx, please?
Please provide a minimal example that reproduces the problem.
Hello! Does this sample has same kind of hang?
#include <string>
#include <zmq.hpp>
int main()
{
const std::string text("hello");
zmq::context_t context;
zmq::socket_t socket(context, ZMQ_PUSH);
zmq::message_t message(text.data(), text.size());
socket.connect("tcp://localhost:6666");
socket.setsockopt(ZMQ_SNDTIMEO, 100);
socket.send(message, zmq::send_flags::dontwait);
return 0;
}
Please provide a minimal example that reproduces the problem.
Hello! Does this sample has same kind of hang?
#include <string> #include <zmq.hpp> int main() { const std::string text("hello"); zmq::context_t context; zmq::socket_t socket(context, ZMQ_PUSH); zmq::message_t message(text.data(), text.size()); socket.connect("tcp://localhost:6666"); socket.setsockopt(ZMQ_SNDTIMEO, 100); socket.send(message, zmq::send_flags::dontwait); return 0; }
Yes, above code sample has the same hang with backtrace like this: (gdb) bt
If it provides any context (forgive the pun), if I call context.close()
in a class destructor where the context is a member it also hangs, despite if I call the same shutdown sequence manually before the class that the context is resident in is destroyed working fine.
I guarantee all sockets are closed because they are the last actions out of all the work threads and I join all work threads before closing context. All threads join and then context close is called. If I call this stop function before the object goes out of scope and the destructor is called it works fine. If I let the destructor call then it hangs.
EDIT
I think in my situation it is because this is being loaded as a shared library and unloaded, and the unloading application is terminating threads aggressively in a way that causes the context to not shut down appropriately since it is waiting on the reaper which has been killed by the parent application.
Please provide a minimal example that reproduces the problem.
Hello! Does this sample has same kind of hang?
#include <string> #include <zmq.hpp> int main() { const std::string text("hello"); zmq::context_t context; zmq::socket_t socket(context, ZMQ_PUSH); zmq::message_t message(text.data(), text.size()); socket.connect("tcp://localhost:6666"); socket.setsockopt(ZMQ_SNDTIMEO, 100); socket.send(message, zmq::send_flags::dontwait); return 0; }
Yes, above code sample has the same hang with backtrace like this: (gdb) bt #0 0x00007ffff6efa7e1 in poll () from /lib64/libc.so.6 #1 0x00007ffff7b8093d in zmq::signaler_t::wait(int) () from /opt/phoenix/lib64/libzmq.so.5 #2 0x00007ffff7b6789c in zmq::mailbox_t::recv(zmq::command_t*, int) () from /opt/phoenix/lib64/libzmq.so.5 #3 0x00007ffff7b59c61 in zmq::ctx_t::terminate() () from /opt/phoenix/lib64/libzmq.so.5 #4 0x00007ffff7b9b93a in zmq_ctx_term () from /opt/phoenix/lib64/libzmq.so.5 #5 0x00000000004014d1 in zmq::context_t::close (this=0x7fffffffdf28) at cppzmq/zmq.hpp:670 #6 0x00000000004014a6 in zmq::context_t::~context_t (this=0x7fffffffdf28, __in_chrg=) at cppzmq/zmq.hpp:661 #7 0x000000000040120a in main () at test1.cpp:7
I am getting this same issue in one module of freeswitch mod_event_zmq
Currently I see ZeroMQ/cppzmq/libzmq hanging, after I try to exit the process.
I am using libzmq 4.2.2 on Mac OS X Sierra (10.12.6) and thought that this behaviour reflects bug described here: https://github.com/zeromq/libzmq/issues/1279, wich is considered to be resolved.
I post it here because I use cppzmq bindings, but I see cppzmq to cause an abort in this line:
This is my code to shutdown the context + socket(s):
Closing the socket here, causes the process to abort with SIGABRT. When not calling
close()
on the socket the process just hangs forever.I am also not able to create non-blocking context, whenever the I call
zmq_ctx_set(native_context, ZMQ_BLOCKY, 0);
I receive-1
as return value.