nixcloud / ip2unix

Turn IP sockets into Unix domain sockets
GNU Lesser General Public License v3.0
357 stars 10 forks source link

sockopts: Collect and replay epoll_ctl calls #13

Closed aszlig closed 4 years ago

aszlig commented 4 years ago

Some services, particularily the rsession command from RStudio and possibly a few others use epoll_ctl() for adding the file descriptor of the socket before actually binding the socket.

Since we need sockaddr information from the bind syscall to be able to decide whether we need to transform the socket into a Unix domain socket or not, we have to handle epoll_ctl alongside other socket operations since once we replace the socket the file descriptor added via epoll_ctl is no longer valid.

Right now the way we replay epoll_ctl() calls is somewhat dumb, so for example if the program is doing EPOLL_CTL_ADD, followed by EPOLL_CTL_MOD, we essentially repeat the same chain even though we could just shrink it down to one EPOLL_CTL_ADD.

Nevertheless, since epoll_ctl usually is only used once per socket, this shouldn't be an issue.

@riedel: Can you please check whether this fixes your issue with rsession?

riedel commented 4 years ago

I tried i (Hopefully I recompiled it correctly) but it does not seem to work :( . I wrote a script to easily test rsession:

#/bin/bash

I2U="ip2unix -r path=rsession.sock"
LDP=

#I2U=
#LDP=./libsocket_wrapper.so
#mkdir $PWD/rsession.sock

Rscript -e 'cat(paste(R.home("home"),R.home("share"),R.home("include"),R.home("doc"),getRversion(),sep="\n"))'|
{
        readarray -t R
        R_HOME=${R[0]} R_SHARE_DIR=${R[1]} R_INCLUDE_DIR=${R[2]}  R_DOC_DIR=${R[3]} RSTUDIO_DEFAULT_R_VERSION_HOME=${R[0]} RSTUDIO_DEFAULT_R_VERSION=${R[4]} \
        SOCKET_WRAPPER_DIR=$PWD/rsession.sock LD_PRELOAD=$LDP $I2U rsession --standalone=1 --program-mode=server --log-stderr=1  --www-address=127.0.0.1 --www-port=12345 &
        echo $! >rsession.pid
}

sleep 3

SOCKET_WRAPPER_DIR=$PWD/rsession.sock LD_PRELOAD=$LDP $I2U curl -I http://127.0.0.1:12345

curl -I http://127.0.0.1:12345

kill $(< rsession.pid) && rm rsession.pid

rm -r rsession.sock

relevant output:

+ ip2unix -r path=rsession.sock curl -I http://127.0.0.1:12345
curl: (7) Couldn't connect to server
+ curl -I http://127.0.0.1:12345
curl: (7) Failed to connect to 127.0.0.1 port 12345: Connection refused
+ kill 495701
+ rm rsession.pid
+ rm -r rsession.sock
rm: cannot remove ‘rsession.sock’: No such file or directory
aszlig commented 4 years ago

@riedel: Just added a test which specifically tests rsession:

https://github.com/nixcloud/ip2unix/blob/5105d3229927d8f8bf947b2fff617929d66e2b05/tests/programs/rsession.nix#L14-L20

Unfortunately, the test succeeds. Here's the output with the grep command removed:

building '/nix/store/yn17s3psa0c70c2vi0934f61m36ba167-test-rsession.drv'...
Listener bound to address 127.0.0.1 port 8080
<!DOCTYPE html>

<!--
#
# index.htm
#
# Copyright (C) 2009-19 by RStudio, Inc.
#
# This program is licensed to you under the terms of version 3 of the
# GNU Affero General Public License. This program is distributed WITHOUT
# ANY EXPRESS OR IMPLIED WARRANTY, INCLUDING THOSE OF NON-INFRINGEMENT,
# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Please refer to the
# AGPL (http://www.gnu.org/licenses/agpl-3.0.txt) for more details.
#
-->

<!-- standards mode -->
<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
    <meta name="gwt:property" content="compiler.stackMode=native"/>
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no" />
    <meta name="csrf-token" content="b83430b1-8f56-4d05-82d4-c6b6f98f5607" />
    <link rel="shortcut icon" href="images/favicon.ico" />
    <title>RStudio</title>
    <script>window.program_mode = "server";
</script>
    <link type="text/css" rel="stylesheet" href="css/icons.css" />
    <script type="text/javascript" language="javascript" src="rstudio/rstudio.nocache.js"></script>
  </head>

  <body>
  </body>

</html>

I also checked whether rsession uses epoll_ctl just to be sure and also compared the test against ip2unix from master (which fails as expected).

Is there something I'm doing wrong or something I'm forgetting (I'm no R expert)?

riedel commented 4 years ago

Love your test! Nix makes it really easy, impressive! I will test it on more systems. But this is strange! The system where I need it (and tested it because I had the build env installed) is a cluster node with redhat enterprise linux 7 with conda as package manager and a GPFS filesystem. I really now have a case for switching to nix :) Give me some time and I will get it to run somewhere.

aszlig commented 4 years ago

Love your test! Nix makes it really easy, impressive! I will test it on more systems. But this is strange! The system where I need it (and tested it because I had the build env installed) is a cluster node with redhat enterprise linux 7 with conda as package manager and a GPFS filesystem. I really now have a case for switching to nix :) Give me some time and I will get it to run somewhere.

While Nix does make a lot of things easier in this regard (after all that's the reason I'm using it), I'd actually want to know why it doesn't work on other distros, since "just use Nix" is probably not the best idea to recommend to people who want to use your project :-D

Anyway, if you strace, is the epoll_ctl call properly replayed (as in: it should be called twice, once for the original socket and the replaced socket)?

riedel commented 4 years ago

Everything looks perfectly fine! I am really puzzled, too, why a bind would not result into a filesystem entry.

socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3395039680, u64=94153373135296}}) = 0
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
socket(AF_UNIX, SOCK_STREAM, 0)         = 7
fcntl(6, F_GETFD)                       = 0
fcntl(7, F_SETFD, 0)                    = 0
fcntl(6, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(7, F_SETFL, O_RDWR)               = 0
fcntl(6, F_GETSIG)                      = 0
fcntl(7, F_SETSIG, 0)                   = 0
fcntl(6, F_GETOWN_EX, {type=F_OWNER_PID, pid=0}) = 0
fcntl(7, F_SETOWN_EX, {type=F_OWNER_PID, pid=0}) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=3395039680, u64=94153373135296}}) = 0
setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
dup2(7, 6)                              = 6
close(7)                                = 0
bind(6, {sa_family=AF_UNIX, sun_path="/smartdata/iu5681/rsession.sock"}, 110) = 0
listen(6, 128)                          = 0
ioctl(6, FIONBIO, [1])                  = 0
accept4(6, NULL, NULL, 0)               = -1 EAGAIN (Resource temporarily unavailable)
aszlig commented 4 years ago
bind(6, {sa_family=AF_UNIX, sun_path="/smartdata/iu5681/rsession.sock"}, 110) = 0

This would suggest, that the socket file should actually be there, but for some reason it gets closed and unlinked. Can you post what happens to file descriptor 6 afterwards?

riedel commented 4 years ago

that is the strange thing: nothing happens and its not there. I even switched to /var/tmp for an xfs filesystem because I am so confused about this. I grepped for '(6' and unlink: https://gist.github.com/riedel/043936e3c80da0f2607d48d8205d63e0#file-rsession-strace-L743. I even asked on stackoverflow if such thing can even happen: https://stackoverflow.com/questions/61279813/successfull-call-to-bind2-with-af-unix-does-not-generate-a-socket-file . Don't ask me how confused I am about this.

aszlig commented 4 years ago

@riedel: Hm, first of all: Is /smartdata/iu5681 on a different file system and if yes, which file system type?

Also, can you repeat strace with the -f option, since just after binding to the socket, there is a clone syscall involved?

riedel commented 4 years ago

Doesn't get better: https://gist.github.com/riedel/043936e3c80da0f2607d48d8205d63e0#file-rsession-strace-L759

The filesystem is a IBM GPFS, I also had the suspicion before, for this run I switched to: /dev/mapper/vg01-lv_var on /var type xfs (rw,relatime,attr2,inode64,noquota)

aszlig commented 4 years ago

Doesn't get better

Right, but at least we now know where the socket file gets unlinked:

https://gist.github.com/riedel/043936e3c80da0f2607d48d8205d63e0#file-rsession-strace-L3960

aszlig commented 4 years ago

Okay, so the interesting part here is that it happens prior to executing /bin/sh -c 'git "--version"', where all file descriptors are closed. However, since after the clone the file descriptor is still active in the parent, we should not unlink the socket file.

To fix this, we need to implement some kind of reference counting... oh geesh...

aszlig commented 4 years ago

@riedel: Which glibc version do you have? According to distrowatch, RHEL 7 should have 2.17, is this correct?

aszlig commented 4 years ago

@riedel: Let's continue in #16, since epoll_ctl should be properly handled here.