Open johnbensnyder opened 4 years ago
Thanks for the suggestion. We will add the Docker setup instructions to the profiler guide as suggested.
-ck
On Wed, Jun 10, 2020 at 12:35 PM Ben Snyder notifications@github.com wrote:
GPU profiling in Docker requires including the docker run option '--privileged=true'.
Topic is discussed in this issue:
tensorflow/tensorflow#35860 https://github.com/tensorflow/tensorflow/issues/35860
Can Docker setup instructions be included on the profiler setup page?
https://www.tensorflow.org/guide/profiler
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/profiler/issues/63, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE33L3JXM3SKI4DX2FXL35LRV7OADANCNFSM4N2V4RYA .
Good to hear, @ckluk, thank you! If you're open to taking requests, I'd be very interested in a Docker setup in which:
a) GPU profiling works b) the container is run as a normal user (so that all newly created files, eg logs and saved models, are owned by the user, not root)
but I can't get both to work at the same time. I have the following in (the host machine's) /etc/modprobe.d/nvidia-kernel-common.conf
:
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"
and I ran
update-initramfs -u
after adding it (and rebooted afterwards).
The Docker container is created by
docker run -it --gpus=all --rm --user "$(id -u):$(id -g)" dom/tensorflow:2.2.0-gpu
(plus some volume binds etc). Unfortunately, this setup leads to CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
.
Hi Dom,
I don't think we can do much from the Profiler end, as the privilege requirement is from CUPTI. In the future (probably at the timeframe of TF 2.4 release), TF will use CUDA 11. My understanding is that we shouldn't have this CUPTI privilege requirement with CUDA 11.
Thanks, -ck
On Fri, Jun 12, 2020 at 2:17 PM Dom Miketa notifications@github.com wrote:
Good to hear, @ckluk https://github.com/ckluk, thank you! If you're open to taking requests, I'd be very interested in a Docker setup in which:
a) GPU profiling works b) the container is run as a normal user (so that all newly created files, eg logs and saved models, are owned by the user, not root)
but I can't get both to work at the same time. I have the following in (the host machine's) /etc/modprobe.d/nvidia-kernel-common.conf: options nvidia "NVreg_RestrictProfilingToAdminUsers=0" and I ran update-initramfs -u after adding it (and rebooted afterwards).
The Docker container is created by docker run -it --gpus=all --rm --user "$(id -u):$(id -g)" dom/tensorflow:2.2.0-gpu (plus some volume binds etc). Unfortunately, this setup leads to CUPTI_ERROR_INSUFFICIENT_PRIVILEGES.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/profiler/issues/63#issuecomment-643487052, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE33L3OHJ22TPYJ3MEXQDO3RWKLQLANCNFSM4N2V4RYA .
Thanks @ckluk. I was hoping it's a matter of bad setup, but it's good to hear it'll at least eventually get resolved.
@d-miketa Instead of running the container with --privileged=true
, try --cap-add=CAP_SYS_ADMIN
More info: https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti
I ended up doing the following, some subset of which seems to have done the trick:
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"
to /etc/modprobe.d/nvidia-kernel-common.conf
and running update-initramfs -u
export CUDA_VERSION="10.1"
, export LD_LIBRARY_PATH="/usr/local/cuda-${CUDA_VERSION}/lib64:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/lib64
and export LD_INCLUDE_PATH="/usr/local/cuda-${CUDA_VERSION}/include:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/include"
to the host machine's .zshrc
ENV LD_INCLUDE_PATH="/usr/local/cuda/include:/usr/local/cuda/extras/CUPTI/include:$LD_INCLUDE_PATH
to the Dockerfile--privileged
It's possible that --cap-add=CAP_SYS_ADMIN
would work as well as --privileged
, but I haven't tried.
I ended up doing the following, some subset of which seems to have done the trick:
- updating host machine to Ubuntu 20.04
- adding
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"
to/etc/modprobe.d/nvidia-kernel-common.conf
and runningupdate-initramfs -u
- adding
export CUDA_VERSION="10.1"
,export LD_LIBRARY_PATH="/usr/local/cuda-${CUDA_VERSION}/lib64:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/lib64
andexport LD_INCLUDE_PATH="/usr/local/cuda-${CUDA_VERSION}/include:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/include"
to the host machine's.zshrc
- adding
ENV LD_INCLUDE_PATH="/usr/local/cuda/include:/usr/local/cuda/extras/CUPTI/include:$LD_INCLUDE_PATH
to the Dockerfile- running the Docker container with
--privileged
It's possible that
--cap-add=CAP_SYS_ADMIN
would work as well as--privileged
, but I haven't tried.
Hi! how to pass those parameters into Docker container?
I did as follows but got error
nvidia-docker run -d -it --name retina_net -v /home/readib/Experiments/:/ -p 8000:8888 -v /tmp/.X11-unix/:/tmp/.X11-unix -e DISPLAY=$DISPLAY retina_net:latest --cap-add=CAP_SYS_ADMIN /bin/bash
Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "--cap-add=CAP_SYS_ADMIN": executable file not found in $PATH: unknown
Thank you.
In order to run docker: nvidia-docker run '--privileged=true' -d -it --name retina_net -v /home/readib/Experiments/:/home -p 8000:8888 -v /tmp/.X11-unix/:/tmp/.X11-unix -e DISPLAY=$DISPLAY retina_net:latest /bin/bash
GPU profiling in Docker requires including the docker run option '--privileged=true'.
Topic is discussed in this issue:
https://github.com/tensorflow/tensorflow/issues/35860
Can Docker setup instructions be included on the profiler setup page?
https://www.tensorflow.org/guide/profiler