zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.64k stars 2.35k forks source link

Assertion failure in epoll.cpp due to failing AF_UNIX bind() on Windows 10 1803 #4084

Open barometz opened 3 years ago

barometz commented 3 years ago

Issue description

When I run the libzmq test suite, many tests (such as test_reqrep_tcp.exe) abort with an assertion in epoll.cpp. I eventually tracked this down to a bind() call in ip.cpp failing during context creation, with the Winsock error set to WSAEINVAL.

The failing bind call originates at https://github.com/zeromq/libzmq/blob/1d2af8d38842427feba909b2f47275120d104ec8/src/ip.cpp#L564

This error exclusively occurs on Windows 10 1803, which is EOL for consumers and rapidly approaching EOL for businesses, so I don't expect this to be fixed - but if someone searches for it, at least they'll have something. I suspect something's wrong with Windows' (then newly added) AF_UNIX support in 1803, because I couldn't get their own example from https://devblogs.microsoft.com/commandline/windowswsl-interop-with-af_unix/ to work either.

Environment

Minimal test code / Steps to reproduce the issue

Either:

  1. Run test_reqrep_tcp.exe or many other tests

Or:

  1. Create a socket.
    
    #include <zmq.hpp> // not pure libzmq, I know, but I'm already putting more time into this than makes any sense

int main(int, char**) { zmq::context_t ctx; zmq::socket_t sock(ctx, zmq::socket_type::server); return 0; }


# What's the actual result? (include assertion message & call stack if applicable)

Assertion message from 1:

Z:\Debug>test_reqrep_tcp.exe Bad file descriptor (C:\Code\libzmq\src\epoll.cpp:100)

Z:\Debug>echo %errorlevel% 1073741845


VS call stack from 2:

SystemHealth.Qualification.exe!zmq::zmqabort(const char * errmsg) Line 84 C++ SystemHealth.Qualification.exe!zmq::epoll_t::addfd(unsigned int fd, zmq::i_pollevents * events) Line 100 C++ SystemHealth.Qualification.exe!zmq::reaper_t::reaper_t(zmq::ctxt * ctx, unsigned int tid_) Line 50 C++ SystemHealth.Qualification.exe!zmq::ctx_t::start() Line 430 C++ SystemHealth.Qualification.exe!zmq::ctx_t::createsocket(int type) Line 490 C++ SystemHealth.Qualification.exe!zmqsocket(void * ctx, int type_) Line 262 C++ SystemHealth.Qualification.exe!zmq::socket_t::socket_t(zmq::contextt & context, int type_) Line 1564 C++ SystemHealth.Qualification.exe!zmq::socket_t::socket_t(zmq::contextt & context, zmq::sockettype type) Line 1575 C++

What's the expected result?

Passing tests, or a socket gets created and doesn't fail an assertion.

remoteBranch commented 3 years ago

I have seen this sort of stack trace, although google didn't send me here before I debugged it a lot. In my case, appears that the bind(AF_UNIX ...) calls in the signaler_t constructor of reaper_t/mailbox were failing. This snowballs down to the reaper_t constructor which gets a bad file handle from mailbox that it passes on and then finally gets handled by zmq_abort.

I suspect an old OS as well, I am also on 1803.

remoteBranch commented 3 years ago

Also confirm that upgrading Windows 10 does resolve this problem.

edwinvp commented 3 years ago

Same issue here. Recompiling ZeroMQ with ZMQ_HAVE_IPC undefined works for me. But that's at best an undesirable workaround.

hjyheb commented 9 months ago

I have encountered a similar issue on Windows Server 2019. libzmq version: 4.3.5 test_xpub_welcome_msg abort

Through debugging, I found that the issue is caused by the failure to create a temporary file, and there is no validation for the return value of create_ipc_wildcard_address, resulting in a connect failure. create_failed connect_failed

If create_ipc_wildcard_address fails, can we consider using make_fdpair_tcpip instead?