osrf / rocker

A tool to run docker containers with overlays and convenient options for things like GUIs etc.
Apache License 2.0
559 stars 73 forks source link

nvidia extension: OS detection is failing #161

Closed adlarkin closed 2 years ago

adlarkin commented 2 years ago

I cleared out some cached docker images today that were laying around for a few weeks. As a result, when I try to run the --nvidia extension, I get a detect_os failure that I wasn't seeing before.

For example, if I run the following command, I have no issues:

rocker ubuntu:bionic bash

Add the --nvidia argument (rocker --nvidia ubuntu:bionic bash), and I see the following error:

Step 12/12 : CMD [ "" ]
 ---> Running in 165a674db9fa
Removing intermediate container 165a674db9fa
 ---> fbf38e2b1399
Successfully built fbf38e2b1399
Successfully tagged rocker:os_detect_ubuntu_bionic
running,  docker run -it --rm fbf38e2b1399
output:  
/tmp/detect_os failed:
WARNING unable to detect os for base image 'ubuntu:bionic', maybe the base image does not exist

I am using rocker version 0.2.4.

It looks like the error is here: https://github.com/osrf/rocker/blob/main/src/rocker/nvidia_extension.py#L100

Perhaps something has changed in an upstream dependency?

tfoote commented 2 years ago

I can reproduce this. It worked for me then I ran with --nocache and it failed. The static executable used to detect the image is segfaulting. I have validated this issue is on both bionic and focal images.

Testing I found that the old cached detector works, but the rebuilt one is segfaulting.

# FROM rocker_os_detect_ubuntu_bionic_cached_working as detector  # Works (tagged from container working before.)
FROM rocker:os_detect_ubuntu_bionic as detector  # Segfaults after being rebuilt

FROM ubuntu:bionic

COPY --from=detector /tmp/detect_os /tmp/detect_os
ENTRYPOINT [ "/tmp/detect_os" ]
CMD [ "" ]

An even simpler test is: $ docker run -ti --rm rocker_os_detect_ubuntu_bionic_cached_working ('Ubuntu', '18.04', 'bionic') $ docker run -ti --rm rocker:os_detect_ubuntu_bionic



I checked and the base image used for the detector hasn't changed recently (2 years old): https://hub.docker.com/layers/python/library/python/3-slim-stretch/images/sha256-2ff1b4865dc53c88c8506c1fac460645eb527304648b20d15558633c5daecdb1?context=explore

I'm exploring what else has changed. 

Dpkg does not appear to be different: https://gist.github.com/tfoote/d7681b904d3a5cbb1015a8409f492c42

There's a new version of staticx released 0.13.0 vs 0.12.3
https://pypi.org/project/staticx/#history