Closed pawalt closed 1 month ago
Hi Peyton, thanks for reporting this bug. The error you're seeing likely happening while building the internal gViosr sysfs, not because /sys
doesn't exist on the host. When --tpuproxy
is enabled the sandbox builds a mirror of the host PCI directories located in sysfs. The userspace tpu driver relies on the presence of these files to get information about the TPU hardware (version, topology, etc) running on the host. Can you show me what you get when you run ls -l /sys/bus/pci/devices
in your VM?
iirc, you can't run tpuproxy via exec /usr/local/bin/runsc --tpuproxy "$@"
the similar command which works for nvproxy because nvidia-container-runtime
is directly compatible with the --gpus flag implemented by the docker CLI.
it has not implemented in tpuproxy, then tpu devices are not accessible in your docker container.
@manninglucas sure thing here it is:
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ ls -l /sys/bus/pci/devices
total 0
lrwxrwxrwx 1 root root 0 Aug 19 16:27 0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0
lrwxrwxrwx 1 root root 0 Aug 19 16:27 0000:00:01.0 -> ../../../devices/pci0000:00/0000:00:01.0
lrwxrwxrwx 1 root root 0 Aug 19 16:27 0000:00:01.3 -> ../../../devices/pci0000:00/0000:00:01.3
lrwxrwxrwx 1 root root 0 Aug 19 16:27 0000:00:03.0 -> ../../../devices/pci0000:00/0000:00:03.0
lrwxrwxrwx 1 root root 0 Aug 19 16:27 0000:00:04.0 -> ../../../devices/pci0000:00/0000:00:04.0
lrwxrwxrwx 1 root root 0 Aug 19 16:27 0000:00:05.0 -> ../../../devices/pci0000:00/0000:00:05.0
lrwxrwxrwx 1 root root 0 Aug 19 16:27 0000:00:06.0 -> ../../../devices/pci0000:00/0000:00:06.0
lrwxrwxrwx 1 root root 0 Aug 19 16:27 0000:00:07.0 -> ../../../devices/pci0000:00/0000:00:07.0
@milantracy Does this mean we can't use --tpuproxy
at all with the Docker shim, or is there just some other way I need to invoke it? And if it's not possible, I assume it should work OK if I invoke runsc
raw?
afaik, --tpuproxy
doesn't work with the docker shim. cc: @manninglucas
I tried the raw runsc
in the TPU v5e vm, which worked fine for me, let me know how it goes for you.
@milantracy would you mind sharing the command you're using to start runsc
? I'm still having no luck using runsc do
#/bin/bash
sudo runsc --tpuproxy --root=/home/peyton/tputesting/runroot do --root=/home/peyton/tputesting/jax-rootfs -- env -u LD_PRELOAD /bin/bash
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ ./start.sh
starting container: starting root container: starting sandbox: failed to setupFS: mounting submounts: mount submount "/sys": failed to mount "/sys" (type: sysfs): no such file or directory, opts: &{{false false false false} false {true 0xc000620990} false}
EDIT: I'm also getting the same behavior with runsc run
:
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ cat start.sh
#/bin/bash
sudo runsc --root=/home/peyton/tputesting/runroot --tpuproxy run --bundle=/home/peyton/tputesting my-jax-container
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ ./start.sh
running container: starting container: starting root container: starting sandbox: failed to setupFS: mounting submounts: mount submount "/sys": failed to mount "/sys" (type: sysfs): no such file or directory, opts: &{{true false false false} true {true 0xc0005a2420} false}
And my config.json
:
{
"ociVersion": "1.0.2",
"process": {
"terminal": true,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"/bin/sh"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LANG=C.UTF-8",
"PYTHONUNBUFFERED=1"
],
"cwd": "/"
},
"root": {
"path": "jax-rootfs",
"readonly": false
},
"hostname": "jax-container",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
}
],
"linux": {
"namespaces": [
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
}
]
}
}
I've managed to track down the error to this function: https://github.com/google/gvisor/blob/e0643b8ed582cc549272e7788860a5dd4636c06d/pkg/sentry/fsimpl/sys/pci.go#L223
Specifically, this function fails when passed /sys/devices
. It returns ENOENT
. This is despite the file definitely existing:
peyton@t1v-n-901fc2b8-w-0:~/gvisor$ ls -alh /sys
total 4.0K
dr-xr-xr-x 13 root root 0 Aug 20 16:23 .
drwxr-xr-x 19 root root 4.0K Aug 20 16:24 ..
drwxr-xr-x 2 root root 0 Aug 20 16:23 block
drwxr-xr-x 40 root root 0 Aug 20 16:23 bus
drwxr-xr-x 68 root root 0 Aug 20 16:23 class
drwxr-xr-x 4 root root 0 Aug 20 16:23 dev
drwxr-xr-x 15 root root 0 Aug 20 16:23 devices
drwxr-xr-x 6 root root 0 Aug 20 16:23 firmware
drwxr-xr-x 9 root root 0 Aug 20 16:23 fs
drwxr-xr-x 2 root root 0 Aug 20 19:22 hypervisor
drwxr-xr-x 16 root root 0 Aug 20 16:23 kernel
drwxr-xr-x 152 root root 0 Aug 20 16:23 module
drwxr-xr-x 3 root root 0 Aug 20 19:22 power
I'll continue to investigate why this directory can't be opened.
It seems to me that there's something wrong with the mount. When I look inside the sandbox's namespace, /sys does not exist, but I expect it to:
peyton@t1v-n-901fc2b8-w-0:~/gvisor$ sudo ls /proc/75785/root/
etc proc
I'm not sure where to go from here - any pointers on what this should look like would be appreciated.
it has been a while since I did it last time, I will share the runsc command with you later.
@milantracy would you mind sharing the command you're using to start
runsc
? I'm still having no luck usingrunsc do
#/bin/bash sudo runsc --tpuproxy --root=/home/peyton/tputesting/runroot do --root=/home/peyton/tputesting/jax-rootfs -- env -u LD_PRELOAD /bin/bash
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ ./start.sh starting container: starting root container: starting sandbox: failed to setupFS: mounting submounts: mount submount "/sys": failed to mount "/sys" (type: sysfs): no such file or directory, opts: &{{false false false false} false {true 0xc000620990} false}
EDIT: I'm also getting the same behavior with
runsc run
:peyton@t1v-n-901fc2b8-w-0:~/tputesting$ cat start.sh #/bin/bash sudo runsc --root=/home/peyton/tputesting/runroot --tpuproxy run --bundle=/home/peyton/tputesting my-jax-container peyton@t1v-n-901fc2b8-w-0:~/tputesting$ ./start.sh running container: starting container: starting root container: starting sandbox: failed to setupFS: mounting submounts: mount submount "/sys": failed to mount "/sys" (type: sysfs): no such file or directory, opts: &{{true false false false} true {true 0xc0005a2420} false}
And my
config.json
:{ "ociVersion": "1.0.2", "process": { "terminal": true, "user": { "uid": 0, "gid": 0 }, "args": [ "/bin/sh" ], "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "LANG=C.UTF-8", "PYTHONUNBUFFERED=1" ], "cwd": "/" }, "root": { "path": "jax-rootfs", "readonly": false }, "hostname": "jax-container", "mounts": [ { "destination": "/proc", "type": "proc", "source": "proc" }, { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] } ], "linux": { "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" } ] } }
also, can you share with me what the sys
directory looks like in the container
@milantracy When I don't pass --tpuproxy
, this is what it looks like:
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ ./start.sh
Child PID: 82314
Press Enter to continue...
# ls -alh /sys
total 0
drwxr-xr-x 12 root root 0 Aug 20 21:47 .
drwxrwxr-x 2 2004 2004 60 Aug 20 21:47 ..
drwxr-xr-x 2 root root 0 Aug 20 21:47 block
drwxr-xr-x 2 root root 0 Aug 20 21:47 bus
drwxr-xr-x 4 root root 0 Aug 20 21:47 class
drwxr-xr-x 2 root root 0 Aug 20 21:47 dev
drwxr-xr-x 4 root root 0 Aug 20 21:47 devices
drwxr-xr-x 2 root root 0 Aug 20 21:47 firmware
drwxr-xr-x 3 root root 0 Aug 20 21:47 fs
drwxr-xr-x 2 root root 0 Aug 20 21:47 kernel
drwxr-xr-x 2 root root 0 Aug 20 21:47 module
drwxr-xr-x 2 root root 0 Aug 20 21:47 power
When I do pass --tpuproxy
, then /sys
never gets mounted, so it doesn't exist.
When I spin up a cluster in GKE and run with tpuproxy, this is the sandbox spec that gets used. I would try to copy this spec wrt the mounts and devices sections specifically and see how that works. I see you don't have a /sys
mount in your config.json. You may need to add a /sys
mount specifically in the spec to get it working properly.
{
"ociVersion": "1.1.0",
"process": {
"user": {
"uid": 0,
"gid": 0,
"additionalGids": [
0
]
},
"args": [
"bash",
"-c",
"python -c 'import jax; print(\"TPU cores:\", jax.device_count())'"
],
"env": [
"PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"HOSTNAME=tpu-gvisor",
"LANG=C.UTF-8",
"GPG_KEY=A035C8C19219BA821ECEA86B64E628F8D684696D",
"PYTHON_VERSION=3.10.14",
"PYTHON_PIP_VERSION=23.0.1",
"PYTHON_SETUPTOOLS_VERSION=65.5.1",
"PYTHON_GET_PIP_URL=https://github.com/pypa/get-pip/raw/e03e1607ad60522cf34a92e834138eb89f57667c/public/get-pip.py",
"PYTHON_GET_PIP_SHA256=ee09098395e42eb1f82ef4acb231a767a6ae85504a9cf9983223df0a7cbd35d7",
"TPU_SKIP_MDS_QUERY=true",
"TPU_TOPOLOGY=2x2x1",
"ALT=false",
"TPU_HOST_BOUNDS=1,1,1",
"HOST_BOUNDS=1,1,1",
"TPU_RUNTIME_METRICS_PORTS=8431,8432,8433,8434",
"CHIPS_PER_HOST_BOUNDS=2,2,1",
"TPU_CHIPS_PER_HOST_BOUNDS=2,2,1",
"TPU_WORKER_ID=0",
"TPU_WORKER_HOSTNAMES=localhost",
"TPU_ACCELERATOR_TYPE=v5p-8",
"WRAP=false,false,false",
"TPU_TOPOLOGY_WRAP=false,false,false",
"TPU_TOPOLOGY_ALT=false",
"KUBERNETES_PORT_443_TCP_ADDR=34.118.224.1",
"KUBERNETES_SERVICE_HOST=34.118.224.1",
"KUBERNETES_SERVICE_PORT=443",
"KUBERNETES_SERVICE_PORT_HTTPS=443",
"KUBERNETES_PORT=tcp://34.118.224.1:443",
"KUBERNETES_PORT_443_TCP=tcp://34.118.224.1:443",
"KUBERNETES_PORT_443_TCP_PROTO=tcp",
"KUBERNETES_PORT_443_TCP_PORT=443"
],
"cwd": "/",
"apparmorProfile": "cri-containerd.apparmor.d",
"oomScoreAdj": 1000
},
"root": {
"path": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/27ee155751166dcc9569871355a0f90babbd37f94b11a8879f83757597e7da00/rootfs"
},
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/27ee155751166dcc9569871355a0f90babbd37f94b11a8879f83757597e7da00/proc",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/27ee155751166dcc9569871355a0f90babbd37f94b11a8879f83757597e7da00/tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/27ee155751166dcc9569871355a0f90babbd37f94b11a8879f83757597e7da00/devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/27ee155751166dcc9569871355a0f90babbd37f94b11a8879f83757597e7da00/mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/27ee155751166dcc9569871355a0f90babbd37f94b11a8879f83757597e7da00/sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/sys/fs/cgroup",
"type": "cgroup",
"source": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/27ee155751166dcc9569871355a0f90babbd37f94b11a8879f83757597e7da00/cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
},
{
"destination": "/etc/hosts",
"type": "bind",
"source": "/var/lib/kubelet/pods/0c23b742-a930-45e1-80d3-2b358141671e/etc-hosts",
"options": [
"rbind",
"rprivate",
"rw"
]
},
{
"destination": "/etc/hostname",
"type": "bind",
"source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/d711f9789021bb54a955a5c4155b796a79f508db3d9376fafc814ee91a0ce560/hostname",
"options": [
"rbind",
"rprivate",
"rw"
]
},
{
"destination": "/etc/resolv.conf",
"type": "bind",
"source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/d711f9789021bb54a955a5c4155b796a79f508db3d9376fafc814ee91a0ce560/resolv.conf",
"options": [
"rbind",
"rprivate",
"rw"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "/run/containerd/io.containerd.grpc.v1.cri/sandboxes/d711f9789021bb54a955a5c4155b796a79f508db3d9376fafc814ee91a0ce560/shm",
"options": [
"rprivate",
"rw"
]
},
{
"destination": "/run/secrets/kubernetes.io/serviceaccount",
"type": "bind",
"source": "/var/lib/kubelet/pods/0c23b742-a930-45e1-80d3-2b358141671e/volumes/kubernetes.io~projected/kube-api-access-9fzvk",
"options": [
"rbind",
"rprivate",
"ro"
]
}
],
"annotations": {
"dev.gvisor.flag.debug": "true",
"dev.gvisor.flag.debug-log": "/tmp/runsc/",
"dev.gvisor.flag.panic-log": "/tmp/runsc/panic.log",
"dev.gvisor.flag.strace": "true",
"dev.gvisor.internal.tpuproxy": "true",
"io.kubernetes.cri.container-name": "tpu-gvisor",
"io.kubernetes.cri.container-type": "container",
"io.kubernetes.cri.image-name": "gcr.io/gvisor-presubmit/tpu/jax_x86_64:latest",
"io.kubernetes.cri.sandbox-id": "d711f9789021bb54a955a5c4155b796a79f508db3d9376fafc814ee91a0ce560",
"io.kubernetes.cri.sandbox-name": "tpu-gvisor",
"io.kubernetes.cri.sandbox-namespace": "default",
"io.kubernetes.cri.sandbox-uid": "0c23b742-a930-45e1-80d3-2b358141671e"
},
"linux": {
"uidMappings": [
{
"containerID": 0,
"hostID": 0,
"size": 4294967295
}
],
"gidMappings": [
{
"containerID": 0,
"hostID": 0,
"size": 4294967295
}
],
"resources": {
"memory": {},
"cpu": {
"shares": 2,
"period": 100000
},
"unified": {
"memory.oom.group": "1",
"memory.swap.max": "0"
}
},
"cgroupsPath": "kubepods-besteffort-pod0c23b742_a930_45e1_80d3_2b358141671e.slice:cri-containerd:27ee155751166dcc9569871355a0f90babbd37f94b11a8879f83757597e7da00",
"namespaces": [
{
"type": "pid"
},
{
"type": "ipc",
"path": "/proc/8016/ns/ipc"
},
{
"type": "uts",
"path": "/proc/8016/ns/uts"
},
{
"type": "mount"
},
{
"type": "network",
"path": "/proc/8016/ns/net"
},
{
"type": "cgroup"
},
{
"type": "user"
}
],
"devices": [
{
"path": "/dev/vfio/2",
"type": "c",
"major": 245,
"minor": 1,
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/vfio/3",
"type": "c",
"major": 245,
"minor": 0,
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/vfio/0",
"type": "c",
"major": 245,
"minor": 3,
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/vfio/1",
"type": "c",
"major": 245,
"minor": 2,
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/vfio/vfio",
"type": "c",
"major": 10,
"minor": 196,
"fileMode": 438,
"uid": 0,
"gid": 0
}
]
}
}
@manninglucas thanks for the config! I've tried this with a /sys
mount, and I'm still getting the same error:
{
"ociVersion": "1.0.2",
"process": {
"terminal": true,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"/bin/sh"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LANG=C.UTF-8",
"PYTHONUNBUFFERED=1"
],
"cwd": "/"
},
"root": {
"path": "jax-rootfs",
"readonly": false
},
"hostname": "jax-container",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "/sys",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
}
],
"linux": {
"namespaces": [
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
}
]
}
}
and the run:
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ ./start.sh
Child PID: 235453
Press Enter to continue...
running container: starting container: starting root container: starting sandbox: failed to setupFS: mounting submounts: mount submount "/sys": failed to mount "/sys" (type: sysfs): no such file or directory, opts: &{{true false false false} true {true 0xc00059e8a0} false}
For what it's worth, this error message is not coming from the inner container - it's coming from the runsc-sandbox
process which is in its own mount namespace that appears to only have /etc/
and /proc
. This leads me to believe that the sandbox process needs to have /sys
mirrored in, but I'm not sure how to do that.
From my testing, I'm actually not sure how the GKE sandbox example works. It looks to me like /sys
isn't mounted in the sandbox's namespace, so I'm surprised that the hostDirEntries
call doesn't fail. Are you avoiding putting the sandbox process in its own namespace or something?
I see the issue - the code is looking for TPU devices in specific paths to see if it should bind down to the container. The issues is that in GCE VMs, those paths don't exist! I'm not sure how they work in the first place then, though :)
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ sudo find /dev | grep vfio
/dev/vfio
/dev/vfio/vfio
peyton@t1v-n-901fc2b8-w-0:~/tputesting$ sudo find /dev | grep accel
That behavior is very strange to me. FWIW here's what I see in my VM when I find /dev/ | grep vfio
.
/dev/vfio
/dev/vfio/0
/dev/vfio/1
/dev/vfio/2
/dev/vfio/3
/dev/vfio/vfio
What do you get when you run an unsandboxed TPU workload? How did you create your TPU VM?
@manninglucas It turns out that at least part of this issue was the TPU VM image I was using. I was using tpu-vm-base
, which apparently is quite out of date:
https://github.com/google/jax/issues/13260
I've now switched to tpu-ubuntu2204-base
:
➜ ~ gcloud compute tpus tpu-vm create another-peyton-tpu \
--zone=us-central1-a \
--accelerator-type=v5litepod-1 \
--version=tpu-ubuntu2204-base \
--project=<redacted>
While this is getting further, it's now failing at a later step in the parsing:
W0822 17:59:26.377299 62469 util.go:64] FATAL ERROR: error setting up chroot: error configuring chroot for TPU devices: extracting TPU device minor: open /sys/class/vfio-dev/vfio0/device/vendor: no such file or directory
error setting up chroot: error configuring chroot for TPU devices: extracting TPU device minor: open /sys/class/vfio-dev/vfio0/device/vendor: no such file or directory
Do you know what image the GKE VMs are using?
I may need to use v2-alpha-tpuv5-lite
. I'll try that and get back to you. The fact that device mounting is different depending on the image used is really surprising to me. And it's even more surprising that you're allowed to mount incompatible images.
https://cloud.google.com/tpu/docs/runtimes#pytorch_and_jax
@manninglucas I've tried with the new image with no luck. It looks like the device layout still does not match what GVisor expects. If you know what VM image GKE uses that would be helpful. Here is some output:
peyton@t1v-n-9becfdd7-w-0:~/tputesting$ python3 -c "import jax; print(jax.device_count()); print(repr(jax.numpy.add(1, 1)))"
1
Array(2, dtype=int32, weak_type=True)
peyton@t1v-n-9becfdd7-w-0:~/tputesting$ sudo find /sys/class/vfio/
/sys/class/vfio/
/sys/class/vfio/0
peyton@t1v-n-9becfdd7-w-0:~/tputesting$ sudo find /dev/vfio/
/dev/vfio/
/dev/vfio/0
/dev/vfio/vfio
peyton@t1v-n-9becfdd7-w-0:~/tputesting$ ./start.sh
running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF
And here are the relevant logs again:
I0822 20:44:05.045982 81014 main.go:201] **************** gVisor ****************
I0822 20:44:05.046877 81014 boot.go:264] Setting product_name: "Google Compute Engine"
I0822 20:44:05.046939 81014 boot.go:274] Setting host-shmem-huge: "never"
W0822 20:44:05.047571 81014 specutils.go:129] noNewPrivileges ignored. PR_SET_NO_NEW_PRIVS is assumed to always be set.
I0822 20:44:05.047595 81014 chroot.go:91] Setting up sandbox chroot in "/tmp"
I0822 20:44:05.047707 81014 chroot.go:36] Mounting "/proc" at "/tmp/proc"
W0822 20:44:05.047808 81014 util.go:64] FATAL ERROR: error setting up chroot: error configuring chroot for TPU devices: extracting TPU device minor: open /sys/class/vfio-dev/vfio0/device/device: no such file or directory
error setting up chroot: error configuring chroot for TPU devices: extracting TPU device minor: open /sys/class/vfio-dev/vfio0/device/device: no such file or directory
I believe the image based on COS, should be something like "tpu-vm-cos-109"
@manninglucas Nice those paths exist on that image:
peyton@t1v-n-00ca9571-w-0 ~ $ sudo find /dev/vfio/
/dev/vfio/
/dev/vfio/0
/dev/vfio/vfio
peyton@t1v-n-00ca9571-w-0 ~ $ sudo find /sys/class/vfio-dev/
/sys/class/vfio-dev/
/sys/class/vfio-dev/vfio0
This image is painful to work with because of the read-only filesystem, though. I may have to bite the bullet and figure out how to do the device mapping on v2-alpha-tpuv5-lite
.
I will have a patch up soon that will hopefully fix the issue for the ubuntu image you're using. Seems like /sys/class/vfio-dev/vfio0 just corresponds to /sys/class/vfio/0.
This image is painful to work with because of the read-only filesystem, though. I may have to bite the bullet and figure out how to do the device mapping on
v2-alpha-tpuv5-lite
.
You can remount the filesystem with mount -o remount,rw
as root.
btw, COS has a tool called cos-toolbox
which works around this issue and makes it easier to work with in general. It should be available by default.
Hey @pawalt were you able to get this working for your needs?
hey @manninglucas we deprioritized getting this working. I think Peyton was maybe going to restart the effort when the patch landed: https://github.com/google/gvisor/issues/10795#issuecomment-2305739322.
Gotcha. The patch has finally landed (290789b), let me know when you're able to test this out again!
@manninglucas thanks! The container is now starting up. I'm seeing a different issue when trying to use jax
in python. Not sure if you want to make that part of this issue or another one:
peyton@t1v-n-1f714773-w-0:~/tputesting$ ./start.sh
# python3
Python 3.11.9 (main, Aug 13 2024, 02:18:20) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jax
>>> jax.device_count()
Failed to get TPU metadata (tpu-env) from instance metadata for variable CHIPS_PER_HOST_BOUNDS: INTERNAL: Couldn't connect to server
=== Source Location Trace: ===
learning/45eac/tfrc/runtime/gcp_metadata_utils.cc:99
learning/45eac/tfrc/runtime/env_var_utils.cc:50
Failed to get TPU metadata (tpu-env) from instance metadata for variable HOST_BOUNDS: INTERNAL: Couldn't connect to server
=== Source Location Trace: ===
learning/45eac/tfrc/runtime/gcp_metadata_utils.cc:99
learning/45eac/tfrc/runtime/env_var_utils.cc:50
^C^C^C^CFailed to get TPU metadata (tpu-env) from instance metadata for variable ALT: INTERNAL: Couldn't connect to server
=== Source Location Trace: ===
learning/45eac/tfrc/runtime/gcp_metadata_utils.cc:99
learning/45eac/tfrc/runtime/env_var_utils.cc:50
The command hangs on the jax.device_count()
call, and it loops, spitting out a lot of these logs:
I0917 13:42:09.296208 1 strace.go:576] [ 2: 26] python3 E futex(0x7ef41c4bce54, FUTEX_WAIT_BITSET|FUTEX_PRIVATE_FLAG, 0x0, 0x7ef3cf9feb60 {sec=182 nsec=996258753}, 0x0, 0xffffffff)
I0917 13:42:09.301492 1 strace.go:614] [ 2: 26] python3 X futex(0x7ef41c4bce54, FUTEX_WAIT_BITSET|FUTEX_PRIVATE_FLAG, 0x0, 0x7ef3cf9feb60 {sec=182 nsec=996258753}, 0x0, 0xffffffff) = 0 (0x0) errno=110 (connection timed out) (5.274369ms)
I0917 13:42:09.301517 1 strace.go:576] [ 2: 26] python3 E futex(0x7ef41c4bce58, FUTEX_WAKE|FUTEX_PRIVATE_FLAG, 0x1, null, 0x3b66c325, 0x16e)
I0917 13:42:09.301526 1 strace.go:614] [ 2: 26] python3 X futex(0x7ef41c4bce58, FUTEX_WAKE|FUTEX_PRIVATE_FLAG, 0x1, null, 0x3b66c325, 0x16e) = 0 (0x0) (840ns)
I0917 13:42:09.301539 1 strace.go:576] [ 2: 26] python3 E futex(0x7ef41c4bce54, FUTEX_WAIT_BITSET|FUTEX_PRIVATE_FLAG, 0x0, 0x7ef3cf9feb60 {sec=183 nsec=1590373}, 0x0, 0xffffffff)
I0917 13:42:09.306891 1 strace.go:614] [ 2: 26] python3 X futex(0x7ef41c4bce54, FUTEX_WAIT_BITSET|FUTEX_PRIVATE_FLAG, 0x0, 0x7ef3cf9feb60 {sec=183 nsec=1590373}, 0x0, 0xffffffff) = 0 (0x0) errno=110 (connection timed out) (5.341239ms)
I0917 13:42:09.306924 1 strace.go:576] [ 2: 26] python3 E futex(0x7ef41c4bce58, FUTEX_WAKE|FUTEX_PRIVATE_FLAG, 0x1, null, 0x1e73de, 0x170)
I0917 13:42:09.306938 1 strace.go:614] [ 2: 26] python3 X futex(0x7ef41c4bce58, FUTEX_WAKE|FUTEX_PRIVATE_FLAG, 0x1, null, 0x1e73de, 0x170) = 0 (0x0) (1.32µs)
I0917 13:42:09.306953 1 strace.go:576] [ 2: 26] python3 E futex(0x7ef41c4bce54, FUTEX_WAIT_BITSET|FUTEX_PRIVATE_FLAG, 0x0, 0x7ef3cf9feb60 {sec=183 nsec=6995742}, 0x0, 0xffffffff)
My startup script:
sudo runsc --debug \
--debug-log=/home/peyton/tputesting/logs/ \
--strace \
--root=/home/peyton/tputesting/runroot \
--tpuproxy \
run \
--bundle=/home/peyton/tputesting \
my-jax-container
I'm using a jax image exported from the build below:
FROM python:3.11
RUN pip install jax[tpu] -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
@pawalt let's follow up with a new issue. Looks like libtpu is looking for some metadata that might be stored in an environment variable. Can you run env
on the host?
@manninglucas I've opened #10923
Description
I'm testing out TPU support with the runsc docker shim. When I use
runsc
normally, everything works fine, but when used with--tpuproxy
, it fails to mount /sys. This is surprising to me because the mount is definitely there.cc @thundergolfer
Steps to reproduce
I've configured docker to use my custom runsc script:
This happens despite the mount existing:
I'm using a v5lite tpu:
runsc version
docker version (if using docker)
uname
runsc debug logs (if available)