jupyterhub / systemdspawner

Spawn JupyterHub single-user notebook servers with systemd
BSD 3-Clause "New" or "Revised" License
92 stars 45 forks source link

JupyterHub singleuser not able to use tensorflow gpu support using systemdspawner #67

Closed ccauet closed 4 years ago

ccauet commented 4 years ago

(this is a crossposting to SO, the jupyterhub issue tracker and the jupyterhub/systemdspawner issue tracker)

I have a private JupyterHub Setup using a SystemdSpawner where I try to run tensorflow with gpu support.

I followed the tensorflow instructions and alternatively tried a already provisioned AWS AMI (Deep Learning Base AMI (Ubuntu 18.04) Version 21.0) with NDVIDIA, both on AWS EC2 g4 instances.

On both setups I'm able to use tensorflow with gpu support in an (i)python 3.6 shell

>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
2020-02-12 10:57:13.670937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-12 10:57:13.698230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-12 10:57:13.699066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.73GiB deviceMemoryBandwidth: 298.08GiB/s
2020-02-12 10:57:13.699286: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-12 10:57:13.700918: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-12 10:57:13.702512: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-12 10:57:13.702814: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-12 10:57:13.704561: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-12 10:57:13.705586: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-12 10:57:13.709171: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-12 10:57:13.709278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-12 10:57:13.710120: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-12 10:57:13.710893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

(some warnings about NUMA node, but the gpu is found)

Also using nvidia-smi and deviceQuery shows the gpu:

$ nvidia-smi
Wed Feb 12 10:39:44 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   33C    P8     9W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
$ /usr/local/cuda/extras/demo_suite/deviceQuery
/usr/local/cuda/extras/demo_suite/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla T4"
  CUDA Driver Version / Runtime Version          10.1 / 10.0
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 15080 MBytes (15812263936 bytes)
  (40) Multiprocessors, ( 64) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1590 MHz (1.59 GHz)
  Memory Clock rate:                             5001 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 30
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.0, NumDevs = 1, Device0 = Tesla T4
Result = PASS

Now I start the JupyterHub, login, and open a terminal, there I get:

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

and

$ /usr/local/cuda/extras/demo_suite/deviceQuery
cuda/extras/demo_suite/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
-> no CUDA-capable device is detected
Result = FAIL

and also

Screenshot 2020-02-12 12 08 46

I suspect some kind of "sandbox", missing ENV vars, etc. due to that the gpu drivers are not found inside the singleuser environment and subsequently the tensorflow gpu support does not work.

Any ideas on this? Probably it is either a small configuration tweak or due to the architecture not solvable at all ;)

ccauet commented 4 years ago

Problem solved. I was setting

c.SystemdSpawner.isolate_devices = True

this seemed clever to me in the past, but now when trying to use GPUs was getting in my way.

kaczmarj commented 4 years ago

Would it be possible to use c.SystemdSpawner.isolate_devices = True and also access gpus? Would it be possible to explicitly mount nvidia device files? If so, how would that be done?

ccauet commented 4 years ago

@kaczmarj yes, might be, see the answer on SO

I am not sure... but I can provide pointers to documentation. Setting c.SystemdSpawner.isolate_devices = True sets PrivateDevices=yes in the eventual systemd-run call (source). Refer to the systemd documentation for more information on the PrivateDevices option.

You might be able to keep isolate_devices = True and then explicitly mount the nvidia devices. Though I don't know how to do this...

kaczmarj commented 4 years ago

I wrote that answer :) was wondering if anyone had more information on the problem because I don’t have experience with systemd.

On Feb 16, 2020, at 5:48 PM, Christophe Cauet notifications@github.com wrote:

@kaczmarj yes, might be, see the answer on SO

I am not sure... but I can provide pointers to documentation. Setting c.SystemdSpawner.isolate_devices = True sets PrivateDevices=yes in the eventual systemd-run call (source). Refer to the systemd documentation for more information on the PrivateDevices option.

You might be able to keep isolate_devices = True and then explicitly mount the nvidia devices. Though I don't know how to do this...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ccauet commented 4 years ago

I wrote that answer :)

😂 nevermind, wasn't able to match usernames...