microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.55k stars 3.82k forks source link

Dockerfile.gpu improvement #3286

Closed rsdmse closed 3 years ago

rsdmse commented 4 years ago

Summary

A Dockerfile for the CLI is prepared and tested on GPU. Improvements are made to reduce the final image size compared to the current Dockerfile.gpu.

Motivation

The final image size can be drastically reduced from 2.6 GB (Dockerfile.gpu but without python) to 111 MB. (Here I just compare CLI-only images. Adding conda and the necessary python packages would increase the final image size in both cases by roughly 1 GB; i.e. 3.6 GB vs 1.1 GB.)

Description

Problems with the current Dockerfile.gpu:

  1. The cleanup commands in Dockerfile.gpu do not reduce the image size. They must be in the same RUN statement as the apt-get/conda installation commands.

  2. The use of cuda*-cudnn*-devel as the production base image is unnecessary.

I have prepared a CLI-only Dockerfile here. LightGBM version 2.3.1 is built from source with CUDA 10.2 and cuDNN 8 in a multistage build. The CLI binary and necessary libraries are copied to the final stage.

Note that the cuda10.2-cudnn8-devel image is almost 2.5 GB by itself. If this were used as the production image, I estimate the image size to be around 2.6 GB. Using cuda10.2-base as the production image, I obtained an image size of only 111 MB.

The container was benchmarked on our HPC platform (Rivanna cluster, University of Virginia) using this tutorial. It took 33 seconds for 50 iterations on a GPU, compared to 92 seconds without GPU.

It is not clear to me whether CUDA math libraries/NCCL/cuDNN are needed. They were not needed for the above tutorial at runtime. But if yes, we could use runtime as the production image. Its estimated image size of 1.2 GB would still be less than half of the current image size of 2.6 GB.

References

Please feel free to test our container: https://hub.docker.com/r/uvarc/lightgbm

The tag must be provided explicitly since we do not use the latest tag:

docker pull uvarc/lightgbm:2.3.1
StrikerRUS commented 4 years ago

Hi @rsdmse !

Thanks a lot for your investigation! Please feel free to create a PR with the described improvements.

Speaking about runtime version of CUDA images, please check the following thread: https://github.com/microsoft/LightGBM/issues/3040#issuecomment-669288968.

rsdmse commented 4 years ago

Thank you! I'll create a dockerfile-cli.gpu (CLI-only) to distinguish it between dockerfile.gpu (CLI + Python).

mirekphd commented 4 years ago

It is not clear to me whether CUDA math libraries/NCCL/cuDNN are needed.

CuDNN - are not needed and will not be. The rest - not needed, but will be once the PR adding CUDA support (#3160) is merged in. At least judging from the fact that for compiling xgboost with GPU support they were.

Note also that a single CUDA version is not enough for all users, as compatibility requirements with the driver on the host may require another version (verified this in both directions: I could not run GPU models training in none of 3 GBDT algos I tested using my devel-based CUDA10.2 image on either our CUDA 10.1 prod server or my CUDA 11.0 dev workstation).

rsdmse commented 4 years ago

Thanks for the clarification. In that case I think it would make more sense to use nvidia/opencl rather than nividia/cuda, at least for now. I just built an image with the former and tested it on our machine. As expected, it has exactly the same performance. The image size is reduced by another 7 MB to 104 MB.

mirekphd commented 4 years ago

One peculiarity I noted is the fact that the popular packages which are either device-agnostic or have a pre-compiled GPU variant (would that be the purpose of this Dockerfile too?) such as e.g. xgboost and tensorflow-gpu, are still compiled against CUDA 10.1, which NVIDIA has effectively abandoned... 8 months ago (see nvidia/cuda), resulting in a literally hundred (!) unnecessary vulnerabilities for the unsuspecting users of their public images...

rsdmse commented 4 years ago

We tend to containerize GPU applications for users on our HPC cluster. A user requested LighGBM recently and I came across the dockerfile.gpu in this repo. After some testing I think the 0.1 GB version works just as well as the 2.6 GB version.

We also have a customized dockerfile for tensorflow, but that's mainly because we wanted to use python 3.7 instead of their default 3.6. It's largely based on the official dockerfile and the final image size is very similar. The base image is cuda:base which is the leanest among all 3 (base, runtime, devel).

The base image and proper cleanup can make a huge difference to the final image size.

rsdmse commented 4 years ago

I just tested that distroless works too. Again, same performance on GPU. The image size is further reduced to merely 14 MB. Please feel free to test this out:

docker pull uvarc/lightgbm:2.3.1-distroless
mirekphd commented 4 years ago

Excellent @rsdmse, small is beautiful, Did you start from Google's python distroless images? If so, then please pass the non-GA warning as well (" are considered experimental and not recommended for production usage").

I wonder how I would know if the algo actually uses GPU on new data, where there is no expected benchmark for execution time? We should not assume the ability to observe GPU utilization from another container, because one can limit GPU visibility so that each container has access to a different one/subset..

For me it would be hard to run stuff in production without the ability to monitor resources utilization by the app inside the container (remote shell is the only universal way to do it, because not all Kubernetes/Openshift installations have cluster-level monitoring).

rsdmse commented 4 years ago

This is actually based on the cc-debian10 distroless image, since it's CLI-only. (The tutorial I linked above doesn't involve python.) I'll start playing with their python soon.

You can use the nvidia-smi command to monitor NVIDIA GPU utilization. There should be equivalent commands for non-NVIDIA GPU but I'm not familiar.

mirekphd commented 4 years ago

I'm waiting for their python image to become GA, because it's 19x smaller (only 50 MB instead of 880 MB), and the current state of the art in terms of container size - the Alpine base images - were notoriously unsafe / extra time-consuming for python apps (while not much smaller or safer).

mirekphd commented 4 years ago

You can use the nvidia-smi command to monitor NVIDIA GPU utilization.

I'm having problems with reproducing this in your container. Isn't it a single-entrypoint / single-app container... without any shells?:) So how does one run shell programs like nvidia-smi in it if the only entrypoint is already used by lightgbm?

Let's assume also that the algo does not put device info in its log (even LightGBM does it only at the start of model training). And we can also safely assume that no cluster-level observability tool would allow us to monitor GPU usage from the web UI or CLI.

rsdmse commented 4 years ago

Correct. You would have to fire up the command outside the container as a separate process. (I just tested that on our cluster to verify.)

mirekphd commented 4 years ago

Thank you for the explanation. Please be aware that container users (e.g. data scientists training the models) would not normally have access to the host in Kubernetes/Openshift clusters... they can only use the shell exposed by the apps like Jupyter Notebook's terminal... which now would fail, given that sh and bash have been removed from the container. If those Google containers were rootless, I'd check in our clusters if it's possible to bypass that restriction with a remote shell to the container system using the CLI (using docker it was possible to connect to them, then you just run python, import os and deliver the payload:)... so the only hardening in case of python comes from having fewer vulnerabilities.

rsdmse commented 4 years ago

Oh I see, I wasn't aware of that since I've never used those platforms before. Thanks! I suppose we could have 2 versions, one with opencl and one with distroless.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.