google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.72k stars 1.29k forks source link

lgetxattr incorrectly fails with EBADF for symlinks #11049

Open gimaker opened 20 hours ago

gimaker commented 20 hours ago

Description

The title pretty much sums it up: there's a bug in gvisor's implementation of lgetxattr causing it to incorrectly fail with EBADF for symlinks.

So far we haven't encountered any real issues due to this, aside from causing error message spam when running ls on symlinks:

$ docker run --runtime=runsc -t debian:bookworm ls -l /var/run
ls: /var/run: Bad file descriptor
lrwxrwxrwx 1 root root 4 Sep  4 09:00 /var/run -> /run

Taking a quick glance at the code it appears that the issue is that gVisor translate lgetxattr into a fgetxattr call. And in the case of symlinks it seems to be running fgetxattr on a fd opened with O_PATH, IIUC. As expected that results in EBADF as per open(2) documentation:

       O_PATH (since Linux 2.6.39)
              [...] The file itself is not opened, and other file operations (e.g., read(2), write(2), fchmod(2), fchown(2),  fgetxattr(2),  ioctl(2),
              mmap(2)) fail with the error EBADF.

Steps to reproduce

Put this in a file lgetxattr.c:

#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/xattr.h>

int main() {
  const char *path = "/var/run"; // Path to any symlink
  ssize_t ret = lgetxattr(path, "security.selinux", 0, 0); // specific xattr name is irrelevant
  if (ret == -1) {
    printf("lgetxattr call FAILED: %s (%d)\n", strerror(errno), errno);
  } else {
    printf("lgetxattr call succeed, security.selinux value size=%ld\n", ret);
  }
}

Then run this in the same directory:

$ docker run --runtime=runsc --rm -t --mount type=bind,source="$(pwd)"/lgetxattr.c,destination=/lgetxattr.c gcc:bookworm sh -c "gcc -o /lgetxattr lgetxattr.c && /lgetxattr"
lgetxattr call FAILED: Bad file descriptor (9)

Compare this to running with runc, where we get ENODATA instead of EBADF:

$ docker run --rm -t --mount type=bind,source="$(pwd)"/lgetxattr.c,destination=/lgetxattr.c gcc:bookworm sh -c "gcc -o /lgetxattr lgetxattr.c && /lgetxattr"
lgetxattr call FAILED: No data available (61)

runsc version

runsc version release-20241007.0 spec: 1.1.0-rc.1

docker version (if using docker)

Client: Docker Engine - Community Version: 27.3.1 API version: 1.47 Go version: go1.22.7 Git commit: ce12230 Built: Fri Sep 20 11:41:00 2024 OS/Arch: linux/amd64 Context: default

Server: Docker Engine - Community Engine: Version: 27.3.1 API version: 1.47 (minimum version 1.24) Go version: go1.22.7 Git commit: 41ca978 Built: Fri Sep 20 11:41:00 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.22 GitCommit: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c runc: Version: 1.1.14 GitCommit: v1.1.14-0-g2c9f560 docker-init: Version: 0.19.0 GitCommit: de40ad0

uname

Linux redacted 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

milantracy commented 12 hours ago

how do you plan to use the extended attribute?

even though we could fix this behavior, the extended attributed is not fully supported in gVisor (at least for "security" namespace)

ayushr2 commented 9 hours ago

Thanks for the bug report.

the issue is that gVisor translate lgetxattr into a fgetxattr call. And in the case of symlinks it seems to be running fgetxattr on a fd opened with O_PATH, IIUC. As expected that results in EBADF as per open(2) documentation

Good catch. This bug is broader than just symlinks. As per this, we would receive EBADF for getxattr(2) on regular files too. getxattr(2) is path based, so it should succeed.

even though we could fix this behavior, the extended attributed is not fully supported in gVisor (at least for "security" namespace)

@milantracy We do allow reading security namespace: https://github.com/google/gvisor/blob/b7334af658a3ca16fe28d74470bd6e3d862c2696/pkg/sentry/vfs/permissions.go#L326-L330

Notice how we return nil if ats.MayRead(). So we allow *getxattr(2) syscalls on it.

ayushr2 commented 9 hours ago

Huh, I had already done the investigation in https://github.com/google/gvisor/issues/10385#issuecomment-2094123795. Apart from getxattr(2), fchmod(2) is also broken for symlinks. I will try to come up w a fix.

As per this, we would receive EBADF for getxattr(2) on regular files too. getxattr(2) is path based, so it should succeed.

I was wrong here. We only open O_PATH FDs for symlinks and sockets.

milantracy commented 8 hours ago

Thanks for the bug report.

the issue is that gVisor translate lgetxattr into a fgetxattr call. And in the case of symlinks it seems to be running fgetxattr on a fd opened with O_PATH, IIUC. As expected that results in EBADF as per open(2) documentation

Good catch. This bug is broader than just symlinks. As per this, we would receive EBADF for getxattr(2) on regular files too. getxattr(2) is path based, so it should succeed.

even though we could fix this behavior, the extended attributed is not fully supported in gVisor (at least for "security" namespace)

@milantracy We do allow reading security namespace:

gvisor/pkg/sentry/vfs/permissions.go

Lines 326 to 330 in b7334af

case strings.HasPrefix(name, linux.XATTR_SECURITY_PREFIX): if ats.MayRead() { return nil } return linuxerr.EOPNOTSUPP Notice how we return nil if ats.MayRead(). So we allow *getxattr(2) syscalls on it.

to clarify the extended attributed is not fully supported in gVisor (at least for "security" namespace), I am wondering whether or not setting xattr is needed since we have not implemented anything yet.

There is an internal bug that keeps track of setxattr, I worked on it a bit.

ayushr2 commented 8 hours ago

I am wondering whether or not setting xattr is needed since we have not implemented anything yet.

@milantracy I think we are talking about getxattr support here (not setxattr). @gimaker reported the issue about lgetxattr (which is a variant of getxattr).