scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.41k stars 1.56k forks source link

Posix remote_address() may throw, but it is marked noexcept #2379

Open absporl opened 4 months ago

absporl commented 4 months ago

I ran into the following error:

terminate called after throwing an instance of 'std::system_error'
  what():  getpeername: Transport endpoint is not connected
Aborting on shard 1.

Partial backtrace:

std::terminate() at ??:?
__clang_call_terminate at :?
seastar::net::posix_connected_socket_impl::remote_address() const at ??:?
seastar::tls::tls_connected_socket_impl::remote_address() const at ??:?
seastar::connected_socket::remote_address() const at ??:?

I believe the cause is posix_connected_socket_impl::remote_address() is marked noexcept, but it calls posix_connected_socket_operations::remote_address(), which is not noexcept, and in fact the latter calls file_desc::get_remote_address() which explicitly throws.

See:

https://github.com/scylladb/seastar/blob/master/src/net/posix-stack.cc#L288 https://github.com/scylladb/seastar/blob/master/src/net/posix-stack.cc#L110 https://github.com/scylladb/seastar/blob/master/include/seastar/core/posix.hh#L298

xemul commented 4 months ago

But the class name suggests that the socket is connected, while kernel thinks otherwise. Can you shed more light on this -- how did you end up having unconnected socket wrapped with connected_socket_impl object?

absporl commented 4 months ago

This happened when I used an external tcpkill process to rapidly terminate connections, for testing purposes. What is supposed to happen to a seastar::connected_socket object when it is no longer connected?

xemul commented 4 months ago

This happened when I used an external tcpkill process to rapidly terminate connections, for testing purposes.

Ah, I see. In that case this can be a nice unit test too -- a local listening posix socket, a local established connection, then server-side closes and client-side tries to call (local_|remote_)address()

What is supposed to happen to a seastar::connected_socket object when it is no longer connected?

It won't transform into some other class, that's for sure. No I understand where the issue could be.

xemul commented 4 months ago

Can you show me your full test? The ENOTCON is returned for SYN_SENT and CLOSE states. The latter doesn't happen until client closes its side, and if it does it's no longer valid to call remote_address() on connected_socket(). So I assume we're in SYN_SENT state, but don't see how it happens until connected_socket is instantiated.

niekbouman commented 3 months ago

hi Pavel, apologies for the delay (I took over the task of making a reproducer from my colleague Mark).

Here is an (ugly) reproducer (no attempts to cleanly shutdown services, just a fairly minimal reproducer to trigger the bug):

I first run, in another terminal, tcpkill, and leave it running:

sudo tcpkill -i lo port 8000

Then I run the following program: (I compiled my test in dev mode)

#include "stop_signal.hh"
#include <seastar/core/abort_source.hh>
#include <seastar/core/app-template.hh>
#include <seastar/core/future.hh>
#include <seastar/core/sleep.hh>
#include <seastar/core/thread.hh>
#include <seastar/core/when_all.hh>
#include <seastar/util/log.hh>

static seastar::logger mylog("main");

using namespace seastar;
using namespace std::chrono_literals;

abort_source as;

future<> client()
{
    ipv4_addr addr(8000);

    connected_socket s = co_await connect(addr);
    auto out = s.output();
    auto in = s.input();

    std::string hello{"hello"};
    co_await out.write(hello);
    co_await out.flush();

    auto payload = co_await in.read();
}

future<> server()
{
    connected_socket srv_sock;
    listen_options opts;
    opts.reuse_address = true;
    opts.set_fixed_cpu(0);
    server_socket socket = listen(make_ipv4_address(8000), opts);

    accept_result ar = co_await socket.accept();

    srv_sock = std::move(ar.connection);

    auto in = srv_sock.input();
    auto out = srv_sock.output();

    while (!as.abort_requested()) {

        if (as.abort_requested()) {
            break;
        }

        uint16_t acc{};
        for (int i = 0; i < 10000; ++i) {
            auto ra = srv_sock.remote_address();
            acc += ra.port();
        }

        mylog.info("port sum: {}", acc);
    }
}

int main(int argc, char** argv)
{
    app_template app;

    return app.run(argc, argv, [] {
        return seastar::async([] {
            seastar_apps_lib::stop_signal s;
            auto server_fut = server();
            auto client_fut = client();
            s.wait().get();
        });
    });
}

When the program is run, tcpkill has output like this:

tcpkill: listening on lo [port 8000]
127.0.0.1:54880 > 127.0.0.1:8000: R 1069955445:1069955445(0) win 0
127.0.0.1:54880 > 127.0.0.1:8000: R 1069955705:1069955705(0) win 0
127.0.0.1:54880 > 127.0.0.1:8000: R 1069956225:1069956225(0) win 0
127.0.0.1:54880 > 127.0.0.1:8000: R 1069955445:1069955445(0) win 0
...

The seastar program then crashes with:

...
INFO  2024-08-16 21:59:17,036 [shard 0:main] main - port sum: 20864
INFO  2024-08-16 21:59:17,039 [shard 0:main] main - port sum: 20864
INFO  2024-08-16 21:59:17,041 [shard 0:main] main - port sum: 20864
INFO  2024-08-16 21:59:17,044 [shard 0:main] main - port sum: 20864
terminate called after throwing an instance of 'std::system_error'
  what():  getpeername: Transport endpoint is not connected
Aborting on shard 0.
Backtrace:
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x345cb4
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x30e904
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x339f36
  /lib64/libc.so.6+0x40cff
  /lib64/libc.so.6+0x994a3
  /lib64/libc.so.6+0x40c4d
  /lib64/libc.so.6+0x28901
  /lib64/libstdc++.so.6+0xa5da8
  /lib64/libstdc++.so.6+0xb7c4b
  /lib64/libstdc++.so.6+0xa5950
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x26314a
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x40d106
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x415c2f
  0x21061a
  0x2113ea
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x31d49f
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x31e627
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x31db08
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x291c69
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x291049
  /home/niek/fedora35/cranmera/dev_build/apps/../../thirdparty/seastar/build/dev/libseastar.so+0x291fb9
  0x20ec20
  /lib64/libc.so.6+0x2a087
  /lib64/libc.so.6+0x2a14a
  0x20d8e4
^CKilled
absporl commented 3 months ago

@xemul Is this helpful?

niekbouman commented 1 month ago

Can you show me your full test? The ENOTCON is returned for SYN_SENT and CLOSE states. The latter doesn't happen until client closes its side, and if it does it's no longer valid to call remote_address() on connected_socket(). So I assume we're in SYN_SENT state, but don't see how it happens until connected_socket is instantiated.

Polite "ping" @xemul . See reproducer above.