Open jacobtomlinson opened 6 years ago
I've created a new autoscaling group which uses p3.2xlarge
GPU instance types which are the smallest available in London currently. I've also added a taint to avoid non-GPU work being scheduled on them.
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-11-14T14:02:09Z
labels:
kops.k8s.io/cluster: cluster.k8s.informaticslab.co.uk
name: nodes-GPU-eu-west-2a-p3-2xlarge
spec:
cloudLabels:
k8s.io/cluster-autoscaler/enabled: ""
kubernetes.io/cluster/cluster.k8s.informaticslab.co.uk: owned
hooks:
- execContainer:
image: kopeio/nvidia-bootstrap:1.6
- manifest: "Type=oneshot \nExecStart=/usr/bin/docker run --net
host quay.io/sergioballesteros/check-aws-tags\nExecStartPost=/bin/systemctl
restart kubelet.service\n"
name: ensure-aws-tags.service
requires:
- docker.service
roles:
- Node
image: kope.io/k8s-1.8-debian-stretch-amd64-hvm-ebs-2018-02-08
kubelet:
featureGates:
Accelerators: "true"
machineType: p3.2xlarge
maxSize: 1
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: nodes-GPU-eu-west-2a-p3-2xlarge
role: Node
rootVolumeSize: 120
rootVolumeType: gp2
subnets:
- eu-west-2a
taints:
- informaticslab.co.uk/dedicated=gpu:NoSchedule
We can add the following option to the profile list to add a GPU notebook.
{
'display_name': 'Informatics Lab - ML Pangeo Notebook v0.5.10 (expensive)',
'kubespawner_override': {
'image': '536099501702.dkr.ecr.eu-west-2.amazonaws.com/pangeo-notebook:0.5.10',
'cpu_limit': 8,
'mem_limit': '54G',
'extra_resource_guarantees': {"nvidia.com/gpu": "1"},
'tolerations': [
{
'key': 'informaticslab.co.uk/dedicated',
'operator': 'Equal',
'value': 'gpu',
'effect': 'NoSchedule'
},
]
}
}
This image specifies a GPU and will exactly fill a p3.2xlarge
instance. However there are a few things which stop this from working:
p3.2xlarge
it doesn't seem to consider itself to have "nvidia.com/gpu": "1"
available.
We should add the ability to use GPUs.