Closed snobu closed 6 years ago
If you deploy on AKS and have GPUs on the node, according to the docs:
The appropriate CUDA libraries and debug tools will be available on the node at
/usr/local/nvidia
and must be mounted into the pod using the appropriate volume specification.
Note that you should specify both the volume mount and the GPU label to your deployment.
This is confusing as f###, who's in charge of GPU support, we need to sit down.. Cuda toolkit takes a dependency on the NVIDIA driver and because that's not installed by apt it completely ignores the mounted volume. My YoloLens is f#$&$. I'll try a different route, strace and ldd the deps and roll it into a big sushi roll right in the Dockerfile.
The fact that the toolkit is 1.2 GB in size and shipped as .deb doesn't help. Whatever NVIDIA is smoking is quality stuff. Like if you were to rely on NVIDIA to intercept an Earth-vaporizing asteroid, we're all f####d cause the driver upload won't finish in time and you'll have to intercept in software mode.
Go ask the dinosaurs how that went.
Turns out AKS nodes have CUDA 8.0 support in the NVIDIA drivers and anything higher leads to panic. I've downgraded the FROM image to CUDA 8.0 and all seems good now.
Also, this is deprecated in favor of https://github.com/snobu/yololens
https://docs.microsoft.com/en-us/azure/aks/gpu-cluster.
@radu-matei, do GPU nodes come up naked or do they have the bare minimum, CUDA drivers + toolkit? Or is the container supposed to bring in the drivers? Is that even a thing? I'm confused.
This chart is adding new depths to my confusion -
Yes, i know i'm at Dockercon. The question stands.