Open consideRatio opened 5 years ago
@jkovalski My singleuser config entry looked like:
singleuser:
image:
# default image
name: {account_name}/deep-learning-img
tag: {tag_id}
profileList:
- display_name: "Default GPU environment"
description: "Environment with GPU dependencies installed"
default: true
kubespawner_override:
extra_resource_limits:
nvidia.com/gpu: "1"
The line that might be relevant is the nvidia.com/gpu: "1"
entry.
I had some other options under singleuser
for mounting drives and configuring memory limits per user, but I doubt those are relevant.
@jkovalski and @jeffliu-LL , i think there are probably various ways to get it done, and exact settings likely change with EKS and CUDA versions etc. We have this working on GKE and EKS currently and described the setup in a blog post https://medium.com/pangeo/deep-learning-with-gpus-on-pangeo-9466e25bfd74. It links to our images and config settings which are all open source. One other kubespawner setting we needed was 'environment': {'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility'}
.
@jeffliu-LL @scottyhq Thanks guys. Unfortunately, I still cannot get his working. I suspect it might have to do with the versions of NVIDIA/CUDA, etc. I am using the official Amazon Linux 2 AMI (amazon-eks-gpu-node-1.16-*
), but I'm having trouble finding anything about the NVIDIA drivers that are installed on it. I'm guessing I need to make sure whatever is on that AMI is compatible with what's on the Docker images I'm trying to use.
I'm having trouble getting user-placeholder
pods to tolerate the nvidia/gpu
taint that GKE applies to cluster pool nodes with a GPU. How do I effect the tolerations section of user-placeholder
pods?
UPDATE:
jupyterhub.userTolerations
defined here: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/templates/scheduling/_scheduling-helpers.tpl#L5:L17singleuser.extraTolerations
which one can set in the z2jh helm config yaml, and many people do when enabling GPUs.People having trouble with user-placeholder
, like me, probably have at least two pools (with GPU, without GPU) and don't enable the GPU tolerations in singleuser.extraTolerations, instead they enable it on a per-profile basis using kubespawner_overrides
, and the extra_resource_limits automatically adds the required toleration:
profileList:
- display_name: "Improc analyst profile"
description: "Setup for improc analysts, now with magic GPU đ§đżââď¸-dust"
default: true
kubespawner_override:
extra_resource_limits:
nvidia.com/gpu: "1"
But unless I'm mistaken, this DOES NOT set the toleration on the user-placeholder
pods, it would have to be added to singleuser.extraTolerations
, but then EVERY pod would tolerate, and non-gpu profiles would start getting allocated to GPU nodes potentially?
The problem is that, as far as I can tell, there's no way to accept GPU taint on user-placeholder without accidentally accdepting it on ALL user pods. @consideRatio does that seem correct to you?
The problem is that, as far as I can tell, there's no way to accept GPU taint on user-placeholder without accidentally accdepting it on ALL user pods. @consideRatio does that seem correct to you?
Correct, you are required to have two separate statefulsets with user-placeholders for this. And below I provide some code for you to add that to a helm chart that has JupyterHub helm chart as a dependency without needing to copy paste much code etc.
@snickell, toleration for that taint is typically automatically provided as part of requesting GPU alongside CPU/Memory. I'm copy pasting a solution.
Assuming you have a local chart, that in turn depends on the JupyterHub Helm chart, you can add the following parts to it.
userPlaceholderGPU:
enabled: true
replicas: 0
{{/*
NOTE: This utility template is needed until https://git.io/JvuGN is resolved.
Call a template from the context of a subchart.
Usage:
{{ include "call-nested" (list . "<subchart_name>" "<subchart_template_name>") }}
*/}}
{{- define "call-nested" }}
{{- $dot := index . 0 }}
{{- $subchart := index . 1 | splitList "." }}
{{- $template := index . 2 }}
{{- $values := $dot.Values }}
{{- range $subchart }}
{{- $values = index $values . }}
{{- end }}
{{- include $template (dict "Chart" (dict "Name" (last $subchart)) "Values" $values "Release" $dot.Release "Capabilities" $dot.Capabilities) }}
{{- end }}
{{- if .Values.userPlaceholderGPU.enabled }}
# Purpose:
# --------
# To ensure there is always X numbers of slots available for users that quickly
# needs a GPU pod.
#
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: user-placeholder-gpu-p100
spec:
podManagementPolicy: Parallel
replicas: {{ .Values.userPlaceholderGPU.replicas }}
selector:
matchLabels:
component: user-placeholder-gpu-p100
serviceName: user-placeholder-gpu-p100
template:
metadata:
labels:
component: user-placeholder-gpu-p100
spec:
nodeSelector:
gpu: p100
{{- if .Values.jupyterhub.scheduling.podPriority.enabled }}
priorityClassName: {{ .Release.Name }}-user-placeholder-priority
{{- end }}
{{- if .Values.jupyterhub.scheduling.userScheduler.enabled }}
schedulerName: {{ .Release.Name }}-user-scheduler
{{- end }}
tolerations:
{{- include "call-nested" (list . "jupyterhub" "jupyterhub.userTolerations") | nindent 8 }}
{{- if include "call-nested" (list . "jupyterhub" "jupyterhub.userAffinity") }}
affinity:
{{- include "call-nested" (list . "jupyterhub" "jupyterhub.userAffinity") | nindent 8 }}
{{- end }}
terminationGracePeriodSeconds: 0
automountServiceAccountToken: false
containers:
- image: gcr.io/google_containers/pause:3.1
name: pause
resources:
limits:
nvidia.com/gpu: 1
requests:
nvidia.com/gpu: 1
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
{{- end }}
@consideRatio oh cool, processing now, sorry I replied to myself in an update to my above comment, confusing, my bad
Wow, thank you for collecting that information so clearly @consideRatio , I appreciate it immensely, your workaround makes very good sense to me, and directly addresses my issue, I'll be trying it in a few minutes đđ˝đđ˝đđ˝
We're having the same issue with LD_LIBRARY_PATH
as @astrajohn and for the exact reasons @consideRatio stated, so, LD_LIBRARY_PATH
is ignored when running start.sh
with root user because of sudo
as noted in https://www.sudo.ws/man/1.7.4p6/sudo.man.html
Note that the dynamic linker on most operating systems will remove variables that can control dynamic linking from the environment of setuid executables, including sudo. Depending on the operating system this may include RLD, DYLD, LD, LDR, LIBPATH, SHLIB_PATH, and others. These type of variables are removed from the environment before sudo even begins execution and, as such, it is not possible for sudo to preserve them.
We're using KubeSpawner
to run notebook servers on GKE, our images are based on docker-stacks
images, GPU daemon set is created in our cluster as stated in the issue description, but I noticed the daemon set installs CUDA in /home/kubernetes/bin/nvidia
(see GKE docs) and mounts it to the containers that need GPU at /usr/local/nvidia
which makes the containers rely on LD_LIBRARY_PATH
environment to find CUDA, that's why notebook servers can't detect the GPU if they are running with start.sh
using root user.
Even in @consideRatio's pull request on docker-stacks
, sudo
is still used here, so I don't think that'll fix the issue.
Also, I think @jkovalski is having the same issue.
How are you guys dealing with that?
@mohammedi-haroune I've used the closed PRs changes, replacing the start scripts for my image. It would be great to land similar changes to jupyter/docker-stacks.
@mohammedi-haroune I've used the closed PRs changes, replacing the start scripts for my image. It would be great to land similar changes to jupyter/docker-stacks.
You're using root user to run start.sh
, right?
So, this is the line responsible for keeping LD_LIBRARY_PATH
?
https://github.com/jupyter/docker-stacks/pull/1052/files#diff-41f90d7afcdae13f8516195e078d0a203972b5c5105851eaecfc0a98e9739107R172
echo 'Defaults env_delete -= "PATH LD_* PYTHON*"' >> /etc/sudoers.d/added-by-start-script
When i use a root user and switch to another user, for example to first enable sudo for that user, retaining LD_ vars or PATH var is a challenge.
It is in the transition that environment variables can be stripped, and i think that change is what ensures those arent stripped.
Added this to my Dockerfile
solved the issue, notebook servers running on GKE are now able to detect GPU. Thank you @consideRatio.
RUN echo 'Defaults env_delete -= "LD_*"' >> /etc/sudoers.d/added-by-dockerfile
@mohammedi-haroune thanks for posting this, this has been giving us massive grief too
@consideRatio Hey mate, thanks for all of the amazing work on this. Is your Docker image meant to have libcuda.so cuda libraries installed? I get this error when I try to import tensorflow, which leads to the fact that there is no symlink libcuda.so
Additionally, doesn't look like there's CUDA drivers installed in the same directory.
$ ls | grep cuda
libicudata.a
libicudata.so
libicudata.so.60
libicudata.so.60.2
$ python
Python 3.6.3 |Anaconda, Inc.| (default, Nov 9 2017, 00:19:18)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/opt/conda/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/opt/conda/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.6/site-packages/tensorflow/__init__.py", line 22, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/opt/conda/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/opt/conda/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
If anyone wants to use GPU enabled JupyterHub on GKE Autopilot, I have described the details here at this link
GPU powered machine learning on GKE
To enable GPUs on GKE, this is what I've done. Note that this post is a Work In Progress and will be edited from time to time. To see when the last edit was made, see the header of this post.
Prerequisite knowledge
Kubernetes nodes, pods and daemonsets
A node is represents actual hardware on the cloud, a pod represents something running on a node, and a daemonset will ensure one pod running something is created for each node. If you lack knowledge about kubernetes, I'd recommend learning more at their concepts page.
Bonus knowledge:
This video provides a background allowing you to understand why additional steps is required for this to work: https://www.youtube.com/watch?v=KplFFvj3XRk
NOTE: Regarding taints. GPU nodes will get them on GKE, and pods requesting them will get tolerations, without any additional setup.
1. GKE Kubernetes cluster on a GPU enabled zone
Google has various zones (datacenters), some does not have GPUs. First you must have a GKE cluster coupled with a zone that has GPU access. To find out what zones has GPUs and what kind of GPUs it has, see this page. In overall performance and cost, K80 < P100 < V100. Note that there is also TPUs and that their availability is also zone dependant. This documentation will not address utilizing TPUs though.
Note that GKE Kubernetes clusters comes with a pre-installed with some parts needed for GPUs to be utilized:
nvidia-gpu-device-plugin
. I don't know fully what this does yet.nvidia.com/gpu: 1
properly.2. JupyterHub installation
This documentation assumes you have deployed a JupyterHub already by following the https://z2jh.jupyter.org guide on your Kubernetes cluster.
3. Docker image for the JupyterHub users
I built an image for a basic Hello World with GPU enabled Tensorflow. If you are fine to utilize this, you don't need to do anything further. My image is available as
consideratio/singleuser-gpu:v0.3.0
.About the Dockerfile
I build on top of a jupyter/docker-stacks image to allow JupyterHub to integrate well with. I also pinned
cudatoolkit=9.0
, it is a dependency oftensorflow-gpu
but would install with a even newer version that is unsupported by the GPUs I'm aiming to use, namely Tesla K80 or Tesla P100. To learn more about these compatibility issues see: https://docs.anaconda.com/anaconda/user-guide/tasks/gpu-packages/Dockerfile reference
NOTE: To make this run without a GPU available, you must still install an nvidia driver. This can be done using
apt-get install nvidia-384
, if you do, this must not conflict with thenvidia-driver-installer
daemonset later that still needs to run sadly afaik. This is a rabbithole and hard to maintain I think.3B. Create an image using repo2docker (WIP)
https://github.com/jupyterhub/team-compass/issues/96#issuecomment-447033166
4. Create a GPU node pool
Create a new node pool for your Kubernetes cluster. I choose a
n1-highmem-2
node with a Tesla K80 GPU. These instructions are written and tested for K80 and P100.Note that there is an issue of using autoscaling from 0 nodes, and that it is a slow process to scale up a GPU node as it needs to start, install drivers, and download the image file - each step takes quite a while. I'm expecting 5-10 minutes of startup for this. I recommend you start out with using a single fixed node while setting this up initially.
For details on how to setup a node pool with attached GPUs on the nodes, see: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#create
5. Daemonset: nvidia-driver-installer
You need to make sure the GPU nodes gets appropriate drivers installed. This is what the
nvidia-driver-installed
daemonset will do for you! It will install drivers and utilities in/usr/local/nvidia
, which is required for the conda packagetensorflow-gpu
for example to function properly.NOTE: Tensorflow have a pinned dependency on cudatoolkit, and a given cudatoolkit requires a minimum NVIDIA driver version.
tensorflow=1.11
andtensorflow=1.12
requirescudatoolkit=9.0
andtensorflow=1.13
will requirecudatoolkit=10.0
for example,cudatoolkit=9.0
requires a NVIDIA driver of at least version384.81
andcudatoolkit=10.0
requires a NVIDIA driver of at least version410.48
.Set a driver version for the nvidia-driver-installer daemonset to install
The default driver as of writing for the daemonset above, is
396.26
. I struggled with installing that without this daemonset, so I ended up using384.145
instead.Option 1: Use a one liner
Option 2: manually edit the daemonset manifest...
Reference: https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu
6. Configure some spawn options
Perhaps the user does not always need a GPU, so it is good to allow the user to choose instead. This can be done with the following configuration.
Result
Note that this displays a screenshot of the configuration I've utilized, which differs slightly from the example configuration and setup documented in this post.
7. Verify GPU functionality
After you have got a Jupyter GPU pod launched and running, you could verify your GPU works as intended by...
TensorFlow-Examples/notebooks/convolutional_network.ipynb
, and run all cells.Previous issues
Autoscaling - no longer an issue?
UPDATE: I'm not sure why this happened, but it doesn't happen any more for me.
I've had massive trouble autoscaling. I managed to autoscale from 1 to 2 nodes, but it took 37 minutes... Autoscale down worked as it should, with 10 minutes of a unused GPU node for the be scaled down.
To handle the long scale up time, you can configure a long timeout for kubespawner's spawning procedure like this:
Latest update (2018-11-15)
I got autoscaling to work, but it is slow still, it takes about 9 minutes plus the time for your image to be pulled to the new node. Some lessons learned:
The cluster autoscaler runs simulations using a hardcoded copy of kube-scheduler default configuration logic, so utilizing a custom kube-scheduler configuration with different predicates could cause issues. See https://github.com/kubernetes/autoscaler/issues/1406 for more info.
I stopped using a dynamically applied label as a label selector (
cloud.google.com/gke-accelerator=nvidia-tesla-k80
). I don't remember if this worked at all with the cluster autoscaler, and that it worked to scale from both 0->1 node and from 1->2 nodes. If you want to select a specific GPU from multiple node pools, I'd recommend adding your own pre-defined labels likegpu: k80
and using them to nodeSelector select on.I started using the default-scheduler instead of the jupyterhub-user-scheduler as I figure it would be safer to not risk there was a difference in what predicates they used even though they may have the exact same predicates configured. NOTE: a predicate is a function that takes information about a node in this case, and returns true or false if the node is a candidate to be scheduled on.
To debug the autoscaler:
kubectl describe pod -n jhub jupyter-erik-2esundell
Look for the node pool in the output, mine was named
user-k80
Inspect the status of your node-pool regarding
cloudProviderTarget
,registered
andready
.You want all to become
ready
.You can also inspect the node events with
kubectl describe node the-name-of-the-node
:Potentially related:
I'm using Kubernetes
1.11.2-gke.9
, but my GPU nodes apparently have1.11.2-gke.15
. Autoscaling from 0 nodes: https://github.com/kubernetes/autoscaler/issues/903User placeholders for GPU nodes
Currently the user placeholders can only go to one kind of node pool, and it would make sense to allow the admin to configure how many placeholders for a normal pool and how many for a GPU pool. They are needed for autoscaling ahead of arriving users to not force them to wait for a new node, and this could be extra relevant for GPU nodes as they may need to be created on the fly every time for an arriving real user without the user placeholders.
We could perhaps instantiate multiple placeholder deployment/statefulsets based on a template and some extra specifications.
Pre pulling images specifically for GPU nodes
Currently we can only specify one kind of image puller, pulling all kinds of images to a single type of node. It is pointless to pull and especially to wait for image pulling of unneeded images, so it would be nice to optimize this somehow.
This is tracked in #992 (thanks @jzf2101!)
The future - Shared GPUs
Users cannot share GPUs like they can share CPU, this is an issue. But in the future, perhaps? From what I've heard this is something that is progressing right now.