scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.38k stars 1.55k forks source link

Impossible to close SCTP socket #1618

Open Genomorf opened 1 year ago

Genomorf commented 1 year ago

Hello. Seems that I found a bug in seastar SCTP listener. It's impossible to close the socket while using SCTP protocol. server_socket::abort_accept() is not working at all. Destroying the server_socket object is not helping too. Socket is open while program is alive. I could easily find it with netstat tool. And there is no way to close it.

  1. Create server_socket object via seastar::listen(proto::SCTP)
  2. Call server_socket::accept()
  3. Try to abort_accept and get shutdown: Operation not supported error. More info: https://groups.google.com/g/seastar-dev/c/awC1p7JRoqE/m/U0JiOzHZAAAJ
  4. Destroy the server_socket object
  5. Open netstat -l and look for sctp ip6-localhost:9099
  6. Or try to call seastar::listen(proto::SCTP) again and get address already in use

Code to reproduce the issue:

future<> abort_socket(seastar::server_socket& s)
{
    co_await sleep(1s);
    s.abort_accept();
    // Here we got std::system_error (error system:95, shutdown: Operation not supported)
}

const auto SERVER_ADDR = seastar::ipv6_addr("::1", 9099);

future<> f(seastar::transport proto)
{
    seastar::listen_options const lo{
        .reuse_address = true,
        .proto = proto,
    };
    {
        auto sock = seastar::listen(SERVER_ADDR, lo);
        fmt::print("listening on {}\n", sock.local_address());
        (void)abort_socket(sock); // not working at all, still accepting
        (void)sock.accept();
        co_await sleep(2s);
        // end of scope = "server_socket sock" is dead
    }
    fmt::print("accept stopped, server_socket destroyed\n");
    // time to open netstat and look for currently open sockets for listen
    co_await sleep(100s);
}

int main(int argc, char* argv[])
{
    app_template app;
    return app.run(argc, argv, [] {return f(seastar::transport::SCTP); });
}
avikivity commented 1 year ago

I recommend trying strace to see what's going on.

Genomorf commented 1 year ago

From strace log I cherry-picked lines related to socket and "35". And got this:

openat(AT_FDCWD, "/proc/sys/net/core/somaxconn", O_RDONLY) = 35
read(35, "4096\n", 8191)                = 5
close(35)                               = 0

socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_SCTP) = 35
setsockopt(35, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(35, {sa_family=AF_INET6, sin6_port=htons(9099), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=4294967295}, 28) = 0
listen(35, 100)                         = 0
getsockname(35, {sa_family=AF_INET6, sin6_port=htons(9099), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 0
shutdown(35, SHUT_RD)                   = -1 EOPNOTSUPP (Operation not supported)
close(35) 

socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_SCTP) = 35
setsockopt(35, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(35, {sa_family=AF_INET6, sin6_port=htons(9099), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=4294967295}, 28) = -1 EADDRINUSE (Address already in use)
close(35) 

Also I added a new line to the end of my code from the first comment. I tried to connect again to the same address to show (Address already in use) error.

auto sock = seastar::listen(SERVER_ADDR, lo);

It's the last paragraph from the strace log.

My technical knowledge of linux sockets is limited so sorry for my lack of understanding. Seems that socket 35 was actually closed however I could not connect and bind to it again. Even if I use REUSEADDR. Could you tell me why?

Genomorf commented 1 year ago

UPDATE: Using SO_REUSEPORT instead of SO_REUSEADDR fixes the issue. But I found that 8 years ago you temporarily disable SO_REUSEPORT. @avikivity could you please tell me is it still relevant?
https://github.com/scylladb/seastar/commit/86ffe72b5bcac0af8acc3ef66af31e82131dae67

avikivity commented 1 year ago

It's probably still relevant. But I'd like to understand the issue with shutting down SCTP more. How is one supposed to shut down SCTP sockets? And I'm confused about how SO_REUSEPORT helps, it's something for the open phase.

If it turns out it's the right thing to do, we can enable it for SCTP, but let's not turn it on blindly.

Genomorf commented 1 year ago

Decision to change SO_REUSEADDR to SO_REUSEPORT was actually gamble. I tried different approaches and that one worked.

How is one supposed to shut down SCTP sockets?

I only know that shutdown syscall leads to an error. Here is discussion about it with @xemul: https://groups.google.com/g/seastar-dev/c/awC1p7JRoqE/m/U0JiOzHZAAAJ

And I'm confused about how SO_REUSEPORT helps

It helps not to close the socket but to reuse it. I don't know why SO_REUSEADDR is not enough, though.

My main issue is that I can't connect to the same address after abort_accept() or after destroying the server_socket. Using seastar::connect with SCTP protocol. While seastar::connect with TCP works fine.

It's also an issue about Seastar API. We have seastar::server_socket::abort_accept() function. But it's not working with SCTP connection. If you try to use it, you get "std::system_error operation not supported". Destroying the server_socket object is not helping too. Because SCTP socket remains open. So, there is no way in seastar by now to shutdown or close SCTP socket and reuse the address.

Could you please invite someone to continue discussion? As I said, my technical knowledge of linux sockets is limited. Unfortunately, I don't fully understand what is happening.

kkHAIKE commented 1 year ago

Some questions are related to https://github.com/scylladb/seastar/commit/86ffe72b5bcac0af8acc3ef66af31e82131dae67

  1. What does "some of them, which will probably require the protocol's cooperation," mean?
  2. Shouldn't it be to only disable _reuseport in posix_network_stact/posix_ap_network_stack instead of posix_reuseport_available?
niekbouman commented 3 months ago

Looks as if someone recently (July 2024) committed a patch to the linux kernel that is related to this problem:

https://lore.kernel.org/linux-sctp/171999662855.24990.17495213341507066699.git-patchwork-notify@kernel.org/T/#t

@Genomorf @avikivity Could it be that that linux kernel patch fixes the issue discussed here?

avikivity commented 3 months ago

Looks likely.