Open Genomorf opened 1 year ago
I recommend trying strace to see what's going on.
From strace log I cherry-picked lines related to socket and "35". And got this:
openat(AT_FDCWD, "/proc/sys/net/core/somaxconn", O_RDONLY) = 35
read(35, "4096\n", 8191) = 5
close(35) = 0
socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_SCTP) = 35
setsockopt(35, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(35, {sa_family=AF_INET6, sin6_port=htons(9099), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=4294967295}, 28) = 0
listen(35, 100) = 0
getsockname(35, {sa_family=AF_INET6, sin6_port=htons(9099), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 0
shutdown(35, SHUT_RD) = -1 EOPNOTSUPP (Operation not supported)
close(35)
socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_SCTP) = 35
setsockopt(35, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(35, {sa_family=AF_INET6, sin6_port=htons(9099), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=4294967295}, 28) = -1 EADDRINUSE (Address already in use)
close(35)
Also I added a new line to the end of my code from the first comment. I tried to connect again to the same address to show (Address already in use) error.
auto sock = seastar::listen(SERVER_ADDR, lo);
It's the last paragraph from the strace log.
My technical knowledge of linux sockets is limited so sorry for my lack of understanding. Seems that socket 35 was actually closed however I could not connect and bind to it again. Even if I use REUSEADDR. Could you tell me why?
UPDATE:
Using SO_REUSEPORT instead of SO_REUSEADDR fixes the issue.
But I found that 8 years ago you temporarily disable SO_REUSEPORT. @avikivity could you please tell me is it still relevant?
https://github.com/scylladb/seastar/commit/86ffe72b5bcac0af8acc3ef66af31e82131dae67
It's probably still relevant. But I'd like to understand the issue with shutting down SCTP more. How is one supposed to shut down SCTP sockets? And I'm confused about how SO_REUSEPORT helps, it's something for the open phase.
If it turns out it's the right thing to do, we can enable it for SCTP, but let's not turn it on blindly.
Decision to change SO_REUSEADDR to SO_REUSEPORT was actually gamble. I tried different approaches and that one worked.
How is one supposed to shut down SCTP sockets?
I only know that shutdown syscall leads to an error. Here is discussion about it with @xemul: https://groups.google.com/g/seastar-dev/c/awC1p7JRoqE/m/U0JiOzHZAAAJ
And I'm confused about how SO_REUSEPORT helps
It helps not to close the socket but to reuse it. I don't know why SO_REUSEADDR is not enough, though.
My main issue is that I can't connect to the same address after abort_accept() or after destroying the server_socket. Using seastar::connect with SCTP protocol. While seastar::connect with TCP works fine.
It's also an issue about Seastar API. We have seastar::server_socket::abort_accept() function. But it's not working with SCTP connection. If you try to use it, you get "std::system_error operation not supported". Destroying the server_socket object is not helping too. Because SCTP socket remains open. So, there is no way in seastar by now to shutdown or close SCTP socket and reuse the address.
Could you please invite someone to continue discussion? As I said, my technical knowledge of linux sockets is limited. Unfortunately, I don't fully understand what is happening.
Some questions are related to https://github.com/scylladb/seastar/commit/86ffe72b5bcac0af8acc3ef66af31e82131dae67
Looks as if someone recently (July 2024) committed a patch to the linux kernel that is related to this problem:
@Genomorf @avikivity Could it be that that linux kernel patch fixes the issue discussed here?
Looks likely.
Hello. Seems that I found a bug in seastar SCTP listener. It's impossible to close the socket while using SCTP protocol. server_socket::abort_accept() is not working at all. Destroying the server_socket object is not helping too. Socket is open while program is alive. I could easily find it with netstat tool. And there is no way to close it.
shutdown: Operation not supported
error. More info: https://groups.google.com/g/seastar-dev/c/awC1p7JRoqE/m/U0JiOzHZAAAJsctp ip6-localhost:9099
address already in use
Code to reproduce the issue: