knaw-huc / loghi

MIT License
103 stars 16 forks source link

Problem with libnvidia-ml.so.1 on Debian/WSL2 #3

Closed coret closed 1 year ago

coret commented 1 year ago

When running na-pipeline.sh on Debian Linux on Windows using WSL2 with NVIDIA GPU the following line:

https://github.com/knaw-huc/loghi/blob/e1646b9867301f0d8d568cc1516d9d5867c4b964/na-pipeline.sh#L149

gives the error:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/4f7924ec6b328915cd622e25d220121ae3018c9da36cb4a7ef2109ae719a1f19/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

Based on a suggestion in https://github.com/NVIDIA/nvidia-container-toolkit/issues/289 I made the following Dockerfile:

FROM loghi/docker.htr
RUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.1

and build this image via docker build --no-cache . -t docker.htr.wsl and changed line https://github.com/knaw-huc/loghi/blob/e1646b9867301f0d8d568cc1516d9d5867c4b964/na-pipeline.sh#L41 to use the image just build: DOCKERLOGHIHTR=docker.htr.wsl

With this change, the na-pipeline.sh script runs fine.

Are the deleted files necessary at all in loghi/docker.htr?

Simon-Dirks commented 1 year ago

@rvankoert I run into the same issue, but then with libcudadebugger.so.1. Tested @coret's fix and worked for me. Will send pull request!

nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/a98734b14408a783962c3153a7b8b22aabf7f41de7cda3ff39ada08aaa7a070e/merged/usr/lib/x86_64-linux-gnu/libcudadebugger.so.1: file exists: unknown.