zeromq / azmq

C++ language binding library integrating ZeroMQ with Boost Asio
Boost Software License 1.0
319 stars 109 forks source link

Crash (assertion failure) with Astrill VPN #171

Closed k-stachowiak closed 3 years ago

k-stachowiak commented 3 years ago

I found that the basic example will crash during initialization if the Astrill VPN client is running. I've built the example from the main Readme.md, and ran it with the Visual Studio 2019.

The call stack at the crash is:

    KernelBase.dll!_RaiseException@16()    Unknown Non-user code. Symbols loaded.
>   ZmqTest.exe!zmq::zmq_abort(const char * errmsg_=0x00dea580) Line 84 C++ Symbols loaded.
    ZmqTest.exe!zmq::epoll_t::add_fd(unsigned int fd_=688, zmq::i_poll_events * events_=0x00ddd2f0) Line 100    C++ Symbols loaded.
    ZmqTest.exe!zmq::io_object_t::add_fd(unsigned int fd_=688) Line 66  C++ Symbols loaded.
    ZmqTest.exe!zmq::tcp_connecter_t::start_connecting() Line 149   C++ Symbols loaded.
    ZmqTest.exe!zmq::stream_connecter_base_t::process_plug() Line 82    C++ Symbols loaded.
    ZmqTest.exe!zmq::object_t::process_command(const zmq::command_t & cmd_={...}) Line 87   C++ Symbols loaded.
    ZmqTest.exe!zmq::io_thread_t::in_event() Line 92    C++ Symbols loaded.
    ZmqTest.exe!zmq::epoll_t::loop() Line 206   C++ Symbols loaded.
    ZmqTest.exe!zmq::worker_poller_base_t::worker_routine(void * arg_=0x00d8c0f0) Line 146  C++ Symbols loaded.
    ZmqTest.exe!thread_routine(void * arg_=0x00d8c11c) Line 55  C++ Symbols loaded.
    ZmqTest.exe!thread_start<unsigned int (__stdcall*)(void *),1>(void * const parameter=0x00d97b10) Line 97    C++ Symbols loaded.

The crash occurs during the subscriber object initialization, i.e. after trying to step over the line: azmq::sub_socket subscriber(ios);. And the actual assertion that fails is the second one in the function:

zmq::epoll_t::handle_t zmq::epoll_t::add_fd (fd_t fd_, i_poll_events *events_)
{
    check_thread ();
    poll_entry_t *pe = new (std::nothrow) poll_entry_t;
    alloc_assert (pe);    //  The memset is not actually needed. It's here to prevent debugging
    //  tools to complain about using uninitialised memory.
    memset (pe, 0, sizeof (poll_entry_t));    pe->fd = fd_;
    pe->ev.events = 0;
    pe->ev.data.ptr = pe;
    pe->events = events_;    const int rc = epoll_ctl (_epoll_fd, EPOLL_CTL_ADD, fd_, &pe->ev);
    errno_assert (rc != -1);    //  <- CRASH HERE
    adjust_load (1);    return pe;
}

The crash happens with both libzmq 4.3.3 and 4.3.4. The libraries and the example have been built and ran on Windows 10, in Visual Studio 2019.

The fact that assertion is used there makes me believe this is not an expected situation. Maybe there is a way of configuring ZMQ, either at the build time or run time to make the library not go that failing path? Does it look like some potential issue with ZMQ or rather the VPN's driver bug? It is reproducible in 100% tries, and goes away after uninstalling the VPN client.

On another note - I was not even logged into any virtual network; just having the VPN client running caused the issue to occur. On yet another note, when I put a breakpoint on the failing assertion, I noticed that it doesn't crash on the first two hits, which seem to happen on the main thread. Only after the worker thread hits it, the crash occurs, which can be seen in the back trace.

k-stachowiak commented 3 years ago

Hi,

I'd like to bump this one. Can you provide any insight regarding this issue? Or maybe you could suggest any additional testing actions we could perform to help nailing it down?

aboseley commented 3 years ago

@k-stachowiak , have you tried running the tests on the core libzmq library (https://github.com/zeromq/libzmq)

k-stachowiak commented 3 years ago

@aboseley Sorry for not jumping in immediately. I did finally give it a try, and ran the core ZMQ example found here: https://zeromq.org/get-started/?language=c&library=libzmq# (note slight adjustment to make it working with MSVC):

//  Hello World server
#include <zmq.h>
#include <string.h>
#include <stdio.h>  
#include <assert.h>

#ifdef _WIN32
#include <Windows.h>
#else
#include <unistd.h>
#endif

int main(void)
{
    //  Socket to talk to clients
    void* context = zmq_ctx_new();
    void* responder = zmq_socket(context, ZMQ_REP);
    int rc = zmq_bind(responder, "tcp://*:5555");
    assert(rc == 0);

    while (1) {
        char buffer[10];
        zmq_recv(responder, buffer, 10, 0);
        printf("Received Hello\n");
        Sleep(1);          //  Do some 'work'
        zmq_send(responder, "World", 5, 0);
    }
    return 0;
}

This sample crashes when calling zmq_recv() function at the line:

zmq_recv(responder, buffer, 10, 0);

And it crashes with the exact same backtrace, at the same point in the library as mentioned earlier.

I can see this module is only used upon ZMQ_IOTHRESD_POLLER_USE_EPOLL being defined and was wondering if it's an easy thing to switch off, and if it's even a reasonable thing to do.

k-stachowiak commented 3 years ago

BTW, it looks like a similar problem as in issue 2103 in libzmq's repository except that I thin this time there is a simple scenario to reproduce the issue - libzmq + Astril VPN. I think since we moved it a notch lower down the stack I can raise the issue there.

aboseley commented 3 years ago

@k-stachowiak , you might try a different polling method, by setting POLLER when compiling libzmq.

See this line here: https://github.com/zeromq/libzmq/blob/72b03aa28113d5bd10f6fe130a6ea0324afa8745/CMakeLists.txt#L303

I'm going to close this issue as it seems to be related to libzmq, not the azmq binding