snobu / yolovision

A Python API wrapper on top of Darknet Yolo v3
http://yolovision-aci.westeurope.azurecontainer.io:8000/
Other
3 stars 0 forks source link

We should run this on AKS with GPU nodes #4

Closed snobu closed 6 years ago

snobu commented 6 years ago

https://docs.microsoft.com/en-us/azure/aks/gpu-cluster.

@radu-matei, do GPU nodes come up naked or do they have the bare minimum, CUDA drivers + toolkit? Or is the container supposed to bring in the drivers? Is that even a thing? I'm confused.

This chart is adding new depths to my confusion -

Yes, i know i'm at Dockercon. The question stands.

radu-matei commented 6 years ago

If you deploy on AKS and have GPUs on the node, according to the docs:

The appropriate CUDA libraries and debug tools will be available on the node at /usr/local/nvidia and must be mounted into the pod using the appropriate volume specification.

Note that you should specify both the volume mount and the GPU label to your deployment.

snobu commented 6 years ago

This is confusing as f###, who's in charge of GPU support, we need to sit down.. Cuda toolkit takes a dependency on the NVIDIA driver and because that's not installed by apt it completely ignores the mounted volume. My YoloLens is f#$&$. I'll try a different route, strace and ldd the deps and roll it into a big sushi roll right in the Dockerfile.

The fact that the toolkit is 1.2 GB in size and shipped as .deb doesn't help. Whatever NVIDIA is smoking is quality stuff. Like if you were to rely on NVIDIA to intercept an Earth-vaporizing asteroid, we're all f####d cause the driver upload won't finish in time and you'll have to intercept in software mode.

Go ask the dinosaurs how that went.

vykhand commented 6 years ago

https://blogs.technet.microsoft.com/machinelearning/2018/04/19/deploying-deep-learning-models-on-kubernetes-with-gpus/

snobu commented 6 years ago

Turns out AKS nodes have CUDA 8.0 support in the NVIDIA drivers and anything higher leads to panic. I've downgraded the FROM image to CUDA 8.0 and all seems good now.

Also, this is deprecated in favor of https://github.com/snobu/yololens