klueska-mesosphere / mesos-gpu-docker

10 stars 4 forks source link

Mesos containerizer fails because CentOS 7 doesn't like hard links #1

Open grisaitis opened 7 years ago

grisaitis commented 7 years ago

Hi @klueska! Thanks for providing this demo.

I'm not able to make this work unfortunately, though, due to CentOS 7 not liking hard links. How did you make this work? When I run deploy_tasks, the tasks fail when the mesos containerizer tries to extract a tar ball with hard links:

E1216 01:24:34.440013    60 slave.cpp:4423] Container '8ddf693d-64f0-451d-88e3-214693637a27' for executor 'gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7' of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000 failed to start: Collect failed: Subprocess 'tar, tar, -x, -f, /tmp/mesos/store/docker/staging/MM4QkP/sha256:e931b117db38a05b9d0bbd28ca99a0abe5236a0026d88b3db804f520e59977ec, -C, /tmp/mesos/store/docker/staging/MM4QkP/bca38844f77536fdda92adca64fd4547215fe82177c4559cb94a040115b9c4b7/rootfs' failed: tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease: Cannot open: Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_main_binary-amd64_Packages: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_main_i18n_Translation-en: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_restricted_binary-amd64_Packages: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_restricted_i18n_Translation-en: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial_InRelease: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial_main_binary-amd64_Packages: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial_main_i18n_Translation-en: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial_restricted_binary-amd64_Packages: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial_restricted_i18n_Translation-en: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.lock: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.partial: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.security.ubuntu.com_ubuntu_dists_xenial-security_InRelease: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.security.ubuntu.com_ubuntu_dists_xenial-security_main_binary-amd64_Packages: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.security.ubuntu.com_ubuntu_dists_xenial-security_main_i18n_Translation-en: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.security.ubuntu.com_ubuntu_dists_xenial-security_restricted_binary-amd64_Packages: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: var/lib/apt/lists/.wh.security.ubuntu.com_ubuntu_dists_xenial-security_restricted_i18n_Translation-en: Cannot hard link to `var/lib/apt/lists/.wh.archive.ubuntu.com_ubuntu_dists_xenial-updates_InRelease': Operation not permitted
tar: Exiting with failure status due to previous errors
I1216 01:24:34.440502    22 containerizer.cpp:1950] Destroying container 8ddf693d-64f0-451d-88e3-214693637a27 in PROVISIONING state
I1216 01:24:34.443083    23 slave.cpp:4535] Executor 'gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7' of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000 has terminated with unknown status
I1216 01:24:34.450331    23 slave.cpp:3634] Handling status update TASK_FAILED (UUID: 7e5947e9-f596-4675-bd2b-0b8d84a828d9) for task gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7 of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000 from @0.0.0.0:0
W1216 01:24:34.453387    52 containerizer.cpp:1760] Ignoring update for unknown container 8ddf693d-64f0-451d-88e3-214693637a27
I1216 01:24:34.454707    44 status_update_manager.cpp:323] Received status update TASK_FAILED (UUID: 7e5947e9-f596-4675-bd2b-0b8d84a828d9) for task gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7 of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000
I1216 01:24:34.457034    44 status_update_manager.cpp:832] Checkpointing UPDATE for status update TASK_FAILED (UUID: 7e5947e9-f596-4675-bd2b-0b8d84a828d9) for task gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7 of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000
I1216 01:24:34.457798    50 slave.cpp:4051] Forwarding the update TASK_FAILED (UUID: 7e5947e9-f596-4675-bd2b-0b8d84a828d9) for task gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7 of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000 to master@10.102.40.185:5050
W1216 01:24:34.463029    34 store.cpp:241] Failed to remove staging directory: Directory not empty
I1216 01:24:34.485446    12 status_update_manager.cpp:395] Received status update acknowledgement (UUID: 7e5947e9-f596-4675-bd2b-0b8d84a828d9) for task gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7 of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000
I1216 01:24:34.485836    12 status_update_manager.cpp:832] Checkpointing ACK for status update TASK_FAILED (UUID: 7e5947e9-f596-4675-bd2b-0b8d84a828d9) for task gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7 of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000
I1216 01:24:34.487417    59 slave.cpp:4646] Cleaning up executor 'gpu-test-docker.58039366-c32e-11e6-a556-024266d12af7' of framework 743d6ba7-132c-4eb1-af42-66a39d4ebccb-0000

Did the base CentOS 7 image change maybe? Latest is from 6 weeks ago, after this code was last committed.

grisaitis commented 7 years ago

The same issue happens if I use @mesosphere's mesos images, which use ubuntu 14.04.

E.g. if I modify Dockerfile.mesos-update like:

-FROM mesos-build
+FROM mesosphere/mesos:1.1.0-rc3
 MAINTAINER Kevin Klues <klueska@mesosphere.com>

-RUN git reset --hard HEAD && \
-    git checkout master && \
-    git pull

-RUN cd build && \
-    make -j install
+RUN apt-get install -y curl

Then I get the same error! Bizarre. Maybe my host OS has a weird setting about hard links?