Closed M1chaelM closed 3 years ago
@tfoote This is the same error I was getting before you created the build.bash
script so I think we've confirmed that this is not caused by incorrect rocker versions.
Digging a little further, I can bring up a shell in the image with:
docker run --rm -it -p 8080:8080 --gpus all --entrypoint /bin/bash <hash>
This opens to a prompt with user developer
as expected. Attempting to run sudo
as developer reproduces the error:
developer@2be16481deef:~$ sudo ls
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
The root user exists in /etc/passwd
and has an id of 0. Ownership and permissions on /usr/bin
look appropriate to me:
developer@a3805fc9f8af:~$ ls -ld /usr/bin/
drwxr-xr-x 1 root root 4096 Apr 6 12:19 /usr/bin/
developer@a3805fc9f8af:~$ ls -ld /usr/bin/sudo
-rwsr-xr-x 1 root root 166056 Jan 19 06:21 /usr/bin/sudo
However, one thing that is weird about this system is that my user id was set up to work with NPS's Active Directory instance, using (I think) SSSD
. As a result, the entry carried in by rocker for developer
looks like this:
developer:x:766438638:766400513:Mccarrin, Michael R:/home/developer:/bin/bash
So, this could be a red herring, but I'm wonder if these very large uid
and gid
values are causing a problem.
EDIT: This was indeed a red herring! Testing with other values gives the same error.
Also of note---developer is not a member of any groups besides its own:
developer@2be16481deef:~$ id
uid=766438638(developer) gid=766400513(developer) groups=766400513(developer)
Looking into this, I think it is as expected because sudo
access is being given through the /etc/sudoers.d/rocker
file.
I tried manually running through the process implied by the user snippet here. To do this, I ran rocker without the --user
and --user-overide-name
flags:
rocker --cuda --nvidia --novnc --turbovnc test_novnc
This results in an image running novnc. I then added the developer user manually to see whether there were any errors:
docker exec -it <container_name> /bin/bash
# groupadd -g "766400513" "developer"
# useradd --no-log-init --no-create-home --uid "766438638" -s "/bin/bash" -c "Mccarrin, Michael R" --gid "766400513" -d "/home/developer" "developer"
# echo "developer ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker
# mkdir -p "$(dirname "/home/developer")"
# mkhomedir_helper developer
# su developer
developer:~$ sudo ls
Unfortunately, this all seems to work and gives the same error:
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
Fixed. The source of this error is that the underlying file system in which the docker image cache is stored was mounted with the nosuid
option. Remounting the file system without this option resolves the problem.
There was a cascade of problems that led to this unusual error:
/var/lib/docker
on the root filesystem.nosuid
option by default to prevent clever privilege escalation attacks.This bug report was very helpful for putting this together.
Wow, that's many layers deep and very non-opaque. I am surprised that is exposed through the file-system abstraction. Nice job figuring that out.
Running the
build.bash
script gets all the way to end of therocker
command and then exits with the following error:The
test_novnc
image is built from the latest master, and the virtual environment is built from scratch. One minor deviation that I don't think is causing the problem is that when I pip installrocker.git@cuda
I initially get an error related tobdist_wheel
which I resolved withpip install wheel
.Attempting to bring the image up manually with the command:
exits with the same error.