Open tgross opened 6 months ago
As I noted in our sidebar discussion, the issue here is that the ephemeral user didn't have access to my homedir, which is where I had the plugin binary for testing. Nothing to do here except maybe update docs.
Reopening because there is a bug! The bug is that the driver does not capture any output from unshare
and nsenter
:
https://github.com/hashicorp/nomad-driver-exec2/blob/v0.1.0/pkg/shim/shim.go#L346-L347
The comment event says:
// nsenter and unshare do not log
But they do log on errors like this! If you run the command manually you'll see the unshare output to stderr:
$ sudo unshare --ipc --pid --mount-proc --fork --kill-child=SIGKILL --setuid=84118 --setgid=84118 -- /opt/nomad/plugins/nomad-driver-exec2 exec2-shim true /tmp/alloc_mounts/8a500d41-e2be-875e-d906-5117d0355a2b-exec2/alloc/logs/.exec2.stdout.fifo /tmp/alloc_mounts/8a500d41-e2be-875e-d906-5117d0355a2b-exec2/alloc/logs/.exec2.stderr.fifo rwxc:/tmp/alloc_mounts/8a500d41-e2be-875e-d906-5117d0355a2b-exec2/local rwxc:/tmp/alloc_mounts/8a500d41-e2be-875e-d906-5117d0355a2b-exec2/alloc rwxc:/tmp/alloc_mounts/8a500d41-e2be-875e-d906-5117d0355a2b-exec2/secrets rwxc:/tmp/alloc_mounts/8a500d41-e2be-875e-d906-5117d0355a2b-exec2/tmp -- /bin/bash -c "echo 'it worked'; sleep 10" || echo $?
unshare: failed to execute /opt/nomad/plugins/nomad-driver-exec2: Permission denied
126
This can be captured and sent to the task log fifo. PoC PR incoming.
I've been testing the plugin out and haven't been able to execute any commands with
busybox
. I'm sure there's a configuration value or environment setup I'm missing, but this will also illustrate how challenging it is to debug issues. From the perspective of the job author, all I'm getting is:An example failing task config is:
The logs look like the following (allocID replaced by
$alloc_id
for legibility):I used
strace
to follow the plugin process (ex.sudo strace -o /tmp/strace.log -f -p $pid -e trace=execve -v -s 256
) and see the following relevantexecve
calls. The last one of which is where it's failing when we try to re-exec as the shim:I can run commands like
sleep
orenv
via this plugin without trouble, so the basic environment seems to be fine. As expected, my kernel environment looks good:The plugin and busybox binaries are executable and the dynamic libraries that busybox needs should fall into the default unveil paths:
My configuration is as follows:
Although I also tried explicitly granting access to each item needed: