Closed riedel closed 4 years ago
strace gives me on the faulty case:
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(6, F_GETFD) = 0
fcntl(6, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(6, F_GETSIG) = 0
fcntl(6, F_GETOWN_EX, {type=F_OWNER_PID, pid=0}) = 0
bind(6, {sa_family=AF_UNIX, sun_path="/smartdata/iu5681/xxx.socket"}, 110) = 0
listen(6, 128) = 0
ioctl(6, FIONBIO, [1]) = 0
accept4(6, NULL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
on the working case:
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
fcntl(3, F_GETFD) = 0
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_GETSIG) = 0
fcntl(3, F_GETOWN_EX, {type=F_OWNER_PID, pid=0}) = 0
bind(3, {sa_family=AF_UNIX, sun_path="/smartdata/iu5681/xxx.socket"}, 110) = 0
listen(3, 10) = 0
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
seems to be a bug. I just confirmed that socket_wrapper is working with the above commandline
LD_PRELOAD=$PWD/lib/libsocket_wrapper.so SOCKET_WRAPPER_DIR=$PWD/sockets SOCKET_WRAPPER_DEFAULT_IFACE=10 rsession --standalone=1 --program-mode=server --log-stderr=1 --www-address 127.0.0.1 --www-port 8080
socket_wrapper seems to strip SOCK_CLOEXEC
and SOCK_NONBLOCK
from the socket option. Could that be the cause?
@riedel: Thanks for the report. Could you also reproduce this with nc
or does it only happen with rsession
only?
the "working case" is nc
. I have had a really hard time reproducing the behaviour. It seems to be happening only for rsession
in combination with ip2unix
(rsession
is really interesting because it provides no access control mechanisms, see https://github.com/jupyterhub/jupyter-rsession-proxy/issues/14#issuecomment-627515481) . I am happy to help isolating the issue. I looked at cwrap, which works, but now understand it is working completely differently, directly swapping the socket (so you cannot distiguish the target IP).
Okay, this has nothing to do with the EAGAIN
return from accept
, here is the difference between rsession
without ip2unix
:
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=1446161200, u64=93825006742320}}) = 0
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
bind(6, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
listen(6, 128) = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [28->16]) = 0
ioctl(6, FIONBIO, [1]) = 0
accept(6, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
... and here with ip2unix
:
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=1446163152, u64=93825006744272}}) = 0
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
socket(AF_UNIX, SOCK_STREAM, 0) = 7
fcntl(6, F_GETFD) = 0
fcntl(7, F_SETFD, 0) = 0
fcntl(6, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(7, F_SETFL, O_RDWR) = 0
fcntl(6, F_GETSIG) = 0
fcntl(7, F_SETSIG, 0) = 0
fcntl(6, F_GETOWN_EX, {type=F_OWNER_TID, pid=0}) = 0
fcntl(7, F_SETOWN_EX, {type=F_OWNER_TID, pid=0}) = 0
setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
dup2(7, 6) = 6
close(7) = 0
bind(6, {sa_family=AF_UNIX, sun_path="/build/test.socket"}, 110) = 0
listen(6, 128) = 0
ioctl(6, FIONBIO, [1]) = 0
accept4(6, NULL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
The interesting point here is that epoll_ctl
is executed between the old and the newly replaced socket, so a way to fix this is to do something similar to how we replay setsockopt and friends but for epoll_ctl
.
This is also the reason why socket_wrapper
doesn't have this problem, since it doesn't need to replace the socket.
How to debug a socket not being created?
I do the following but no socket is created (ie. the file simply does not exist) BTW:the server works without ip2unix.
it looks the a case when a socket is created:
Thankful for any clues.