scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.33k stars 1.54k forks source link

we get message too long error when write a tcp connection, and reach a coredump when close it. #361

Open xuguruogu opened 6 years ago

xuguruogu commented 6 years ago

I noticed that the defination of msg_iovlen in struct msghdr is size_t. As far as I know, the struct packet is designed for using iovec. How about use the same size type in packet.


struct msghdr
  {
    void *msg_name;     /* Address to send to/receive from.  */
    socklen_t msg_namelen;  /* Length of address data.  */

    struct iovec *msg_iov;  /* Vector of data to send/receive into.  */
    size_t msg_iovlen;      /* Number of elements in the vector.  */

    void *msg_control;      /* Ancillary data (eg BSD filedesc passing). */
    size_t msg_controllen;  /* Ancillary data buffer length.
                   !! The type should be socklen_t but the
                   definition of the kernel is incompatible
                   with this.  */

    int msg_flags;      /* Flags on received message.  */
  };
gleb-cloudius commented 6 years ago

I do not think it can overflow. We build iovec from a packet, so as long as _nr_frags <= msg_iovlen we are OK.

-- Gleb.

xuguruogu commented 6 years ago

OK, I change the title issue. Actually we found two issues.

  1. We went to an error(message too big) when sending packet, which has too many fragments. This may due to msg_iovlen is larger than UIO_MAXIOV.

https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.13/gtpc2/cpp_sendmsg.html

EMSGSIZE Either the message is too big to be sent as a single datagram or the iovector count is greater than or equal to UIO_MAXIOV as defined in sys/socket.h.

I believe the following function should make sure msg_iovlen is no greater than std::min(UIO_MAXIOV, p.nr_frags()).


inline
future<size_t> pollable_fd::write_some(net::packet& p) {
    return engine().writeable(*_s).then([this, &p] () mutable {
        static_assert(offsetof(iovec, iov_base) == offsetof(net::fragment, base) &&
            sizeof(iovec::iov_base) == sizeof(net::fragment::base) &&
            offsetof(iovec, iov_len) == offsetof(net::fragment, size) &&
            sizeof(iovec::iov_len) == sizeof(net::fragment::size) &&
            alignof(iovec) == alignof(net::fragment) &&
            sizeof(iovec) == sizeof(net::fragment)
            , "net::fragment and iovec should be equivalent");

        iovec* iov = reinterpret_cast<iovec*>(p.fragment_array());
        msghdr mh = {};
        mh.msg_iov = iov;
        mh.msg_iovlen = p.nr_frags();
        auto r = get_file_desc().sendmsg(&mh, MSG_NOSIGNAL);
        if (!r) {
            return write_some(p);
        }
        if (size_t(*r) == p.len()) {
            _s->speculate_epoll(EPOLLOUT);
        }
        return make_ready_future<size_t>(*r);
    });
}
  1. The stack overflowed when delete class deleter. I believe the reason is that we link too many fragments ( >= 2M elements) in a reply, and thus the linked deleter has too many elements. For the upper issue, we failed to write to a tcp connection, and then reach a coredump when close the tcp connection.

inline
deleter::~deleter() {
    if (is_raw_object()) {
        std::free(to_raw_object());
        return;
    }
    if (_impl && --_impl->refs == 0) {
        delete _impl;
    }
}
xuguruogu commented 6 years ago

IOV_MAX is a better choice than UIO_MAXIOV.

Traditionally, the maximum number of scatter/gather elements the system can process in one call were described by the symbolic value {UIO_MAXIOV}. In IEEE Std 1003.1-2001 this value is replaced by the constant {IOV_MAX} which can be found in .

http://pubs.opengroup.org/onlinepubs/007904875/basedefs/limits.h.html

{IOV_MAX} [XSI] [Option Start] Maximum number of iovec structures that one process has available for use with readv() or writev(). Minimum Acceptable Value: {_XOPEN_IOV_MAX} [Option End]

gleb-cloudius commented 6 years ago

On Wed, Nov 22, 2017 at 08:57:28AM +0000, kent wrote:

OK, I change the title issue. Actually we found two issues.

  1. We went to an error(message too big) when sending packet, which has too many fragments. This may due to msg_iovlen is larger than UIO_MAXIOV.

https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.13/gtpc2/cpp_sendmsg.html

EMSGSIZE Either the message is too big to be sent as a single datagram or the iovector count is greater than or equal to UIO_MAXIOV as defined in sys/socket.h.

I believe the following function should make sure msg_iovlen should less than std::min(UIO_MAXIOV, p.nr_frags()). Right.

  1. The stack overflowed when delete class deleter. I believe the reason is that we link too many fragments ( >= 2M elements) in a reply, and thus the linked deleter has too many elements. For the upper issue, we failed to write to a tcp connection, and then reach a coredump when close the tcp connection.

Since _nr_frags is uint16_t how can you link 2M fragments to one packet?

-- Gleb.

xuguruogu commented 6 years ago

@gleb-cloudius

_nr_frags can overflow. We can still append message when reach upper limit of uint16_t. This can be reproduced in the following code. Of cause, seastar has default stack size of 128k.

#include <stdio.h>
#include <iostream>
#include "thirdparty/seastar/net/packet.hh"

int main() {
    auto _pi = std::make_unique<net::packet>();
    for (auto i = 0; i < 1000000; i++) {
        auto b = malloc(10);
        _pi.reset(new net::packet(std::move(*_pi.release()), net::fragment{(char*)b, 10}, make_free_deleter(b)));
        std::cout << _pi->nr_frags() << std::endl;
    }
    _pi.reset();
}
gleb-cloudius commented 6 years ago

On Wed, Nov 22, 2017 at 10:57:58AM +0000, kent wrote:

_nr_frags can overflow. We can still append message when reach upper limit of uint16_t. This can be tested in the following code.

Yes, there is no check in the code for that. Changing it to 32 bit will not make it not overflow, just make it less likely to do so. It is possible to add overflow detection and throw an exception on overflow (and we should do it), but ultimately it is a responsibility of your code to make sure it does not add too much fragments. If you make sure no packet contains more than 64k fragments do you still see the crash?

-- Gleb.

xuguruogu commented 6 years ago

If you make sure no packet contains more than 64k fragments do you still see the crash?

I cannot give you critical number to reproduce the issue. Through gdb, I am sure lower address of frame 0 $sp is inaccessible, which means the code runs into the bottom of the stack.

Maybe we should destruct the deleter from the header to bottom of the link sequentially, instead of recursively.


inline
deleter::~deleter() {
    if (is_raw_object()) {
        std::free(to_raw_object());
        return;
    }
    if (_impl && --_impl->refs == 0) {
        auto ptr = _impl->next._impl;
        while (ptr) {
            if (is_raw_object(ptr)) {
                delete to_raw_object(ptr);
                break;
            }
            if (ptr->refs > 1) {
                delete ptr;
                break;
            }
            auto next = ptr->next._impl;
            ptr->next._impl = nullptr;
            delete ptr;
            ptr = next;
        }
    }
}
gleb-cloudius commented 6 years ago

On Thu, Nov 23, 2017 at 12:46:22AM +0000, kent wrote:

If you make sure no packet contains more than 64k fragments do you still see the crash?

I cannot give you critical number to reproduce the issue. Through gdb, I am sure lower address of frame 0 $sp is inaccessible, which meas the code run into the bottom of the stack.

If _nr_frags overflows the may be buffer overflow and then anything is possible.

-- Gleb.