zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.64k stars 2.35k forks source link

buffer overflow when sending "large" messages via RADIO/DISH UDP #4296

Open ANaablevy opened 2 years ago

ANaablevy commented 2 years ago

Issue description

Sorry for all of my edits. It is actually on the send side. I've updated my ticket.

It appears that zeromq will segfault when receiving sending large messages across UDP using the RADIO/DISH protocol. valgrind reports a buffer overflow.

I have replicated the problem by sending 32768 byte messages, which isn't that large in my opinion. The code seems to be fine for smaller sizes (e.g. 8192). I'm not sure if that is related to MAX_UDP_MSG.

Is it documented that there is a message size cap or should zeromq be able to support (i.e. split) larger messages? 8kb or 32kb seems pretty small.

Environment

Minimal test code / Steps to reproduce the issue

#include <cstdio>
#include <csignal>
#include <string>
#include <chrono>
#include <thread>
#include <atomic>

#include <zmq.hpp>

static std::atomic_bool INTERRUPTED{false};

void sig_handler(int signum) {
    (void)signum;
    INTERRUPTED = true;
}

int main() {
    // stop processing on CTRL+C
    fprintf(stderr, "Press CTRL+C or send SIGINT signal to stop program.\n");
    std::signal(SIGINT, sig_handler);

    zmq::context_t ctx(1);
    zmq::socket_t radio(ctx, zmq::socket_type::radio);
    const std::string radio_addr = "udp://127.0.0.1:5555";
    radio.connect(radio_addr);

    ////////////////////////////////////////////////////////////////////////////
    // adjust size to cause overflow (>8192 ?)
    ////////////////////////////////////////////////////////////////////////////
    const size_t msg_size = 8192 * 4;

    while (!INTERRUPTED) {
        zmq::message_t msg(msg_size);
        radio.send(msg, zmq::send_flags::none);
        std::this_thread::sleep_for(std::chrono::microseconds(50));
    }

    // cleanup
    radio.disconnect(radio_addr);
    radio.close();

    return 0;
}
g++ -std=c++11 -Wall -Wextra -DZMQ_BUILD_DRAFT_API=1 -g zmqbug.cc -o zmqbug -L/usr/lib -lzmq -lpthread
valgrind ./zmqbug

What's the actual result? (include assertion message & call stack if applicable)

==46366== Thread 3 ZMQbg/IO/0:
==46366== Invalid write of size 1
==46366==    at 0x4842B63: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==46366==    by 0x48C2134: memcpy (string_fortified.h:34)
==46366==    by 0x48C2134: zmq::udp_engine_t::out_event() (udp_engine.cpp:477)
==46366==    by 0x4886707: zmq::epoll_t::loop() (epoll.cpp:202)
==46366==    by 0x48BCF9E: thread_routine (thread.cpp:256)
==46366==    by 0x490F608: start_thread (pthread_create.c:477)
==46366==    by 0x4C49292: clone (clone.S:95)
==46366==  Address 0x4e98cf8 is 0 bytes after a block of size 17,848 alloc'd
==46366==    at 0x483C0F3: operator new(unsigned long, std::nothrow_t const&) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==46366==    by 0x48A7FDF: zmq::session_base_t::start_connecting(bool) (session_base.cpp:681)
==46366==    by 0x489457A: zmq::object_t::process_command(zmq::command_t const&) (object.cpp:86)
==46366==    by 0x4887FEB: zmq::io_thread_t::in_event() (io_thread.cpp:91)
==46366==    by 0x488671D: zmq::epoll_t::loop() (epoll.cpp:206)
==46366==    by 0x48BCF9E: thread_routine (thread.cpp:256)
==46366==    by 0x490F608: start_thread (pthread_create.c:477)
==46366==    by 0x4C49292: clone (clone.S:95)

What's the expected result?

The code doesn't crash or zeromq throws an error indicating that the message is too large.

ANaablevy commented 2 years ago

https://github.com/zeromq/libzmq/blob/master/src/udp_engine.cpp#L474

Yeah, I guess someone should add that check but I don't know how that meshes with the zeromq documentation. I guess zeromq could also try to send the message and throw if it got errno EMSGSIZE as a result of a failed sendto call.

ANaablevy commented 2 years ago

Also related to issue #2009 . I think the cause is just memory corruption due to the unchecked message size. The commenter in issue #2009 also mentioned assert errors, which I have also seen but not from the above toy example.

In real code that exercises this overflow condition, I have seen the following assert:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff79bc859 in __GI_abort () at abort.c:79
#2  0x00007ffff7f11d9d in zmq::zmq_abort (errmsg_=<optimized out>) at src/err.cpp:88
#3  0x00007ffff7f26b82 in zmq::epoll_t::reset_pollout (this=<optimized out>, handle_=<optimized out>) at src/epoll.cpp:154
#4  0x00007ffff7f27aed in zmq::io_object_t::reset_pollout (this=this@entry=0x7ffefc005140, handle_=<optimized out>) at src/io_object.cpp:90
#5  0x00007ffff7f622a7 in zmq::udp_engine_t::out_event (this=0x7ffefc005140) at src/udp_engine.cpp:509
#6  0x00007ffff7f26708 in zmq::epoll_t::loop (this=0x55555561b160) at src/epoll.cpp:202
#7  0x00007ffff7f5cf9f in thread_routine (arg_=0x55555561b1b8) at src/thread.cpp:401
#8  0x00007ffff7b92609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#9  0x00007ffff7ab9293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I have also seen an assert due to a failed msg.check() but I did not save the stack trace.

bluca commented 2 years ago

can you send a PR to fix it?

ANaablevy commented 2 years ago

@bluca I can push a PR with the size check and increased buffer size (max is 65,507 bytes when accounting for network overhead) but I don't know if that is the real solution.

I guess the main question is: should zeromq udp support arbitrarily large message sizes? I think the answer is "no" but I'm not sure.

To support arbitrarily large messages, zeromq would be forced to do some sort of internal message batching to account for the UDP size limitations and I'm not sure if that is possible with the thread safety guarantees of the radio/dish socket. There would also have to be additional overhead and error checking to make sure that the entire message was received so that the "all or nothing" behavior of other sockets also applies to radio/dish/udp or maybe that just won't apply to udp. I think the thread safety issue is still a problem but I am not familiar with the whole radio/dish/udp processing pipeline to make that judgement call.

To support arbitrarily large messages, zeromq would be forced to do some sort of internal message batching to account for the UDP size limitations and still maintain the "all or nothing" behavior of other sockets. Or maybe that behavior just won't apply to UDP.

If there is a message size limit, whether hardcoded or based on system limitations, I also don't know if the udp buffer size should be static. Should it be resizable in case UDP can support larger message sizes in the future? It would rarely be resized (i.e. only if the current message size exceeds the last largest message) during the life of the socket so I think that would be acceptable overhead. I think there is also a way to query the max size but I am honestly not familiar with that check. I also don't have a machine with windows or mac os so I might not be the best person to test this code...

alonbl commented 1 year ago

Hi any news?