triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.82k stars 1.42k forks source link

Docker images have repeated layers #7068

Open TheCodeWrangler opened 3 months ago

TheCodeWrangler commented 3 months ago

*Problem: GKE image streaming will not work with these images due to repeated layers I would like to use GKE image streaming with triton-inference-server images.

This feature will only work if the image does not have duplicated layers https://cloud.google.com/kubernetes-engine/docs/how-to/image-streaming#downloads_the_image_without_streaming_the_data

I am wondering if work could be done to restructure the docker build and ensure that duplicate layers do not exist within the triton inference server images

docker pull nvcr.io/nvidia/tritonserver:24.01-py3

docker inspect nvcr.io/nvidia/tritonserver:24.01-py3

Results in 4 layers with the same sha256 hash (5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef)

ClaytonJY commented 3 months ago

Can confirm this exists in the latest image as well:

❯ docker inspect nvcr.io/nvidia/tritonserver:24.03-py3 | jq '.[0].RootFS.Layers' | sort | uniq -c
      1   "sha256:0c5f76392da432595f52b598d72f9f0c7bec12cc99eeea430203fc3b6e0a551c",
      1   "sha256:2757a78913f29b2b87649e92ee6dd73460f496af008c5c8d772c2cbaf128450f",
      1   "sha256:2c8fb4462dd1c3b0257685c36d8e0235bd0fa6cdc86e2934cce01ad67b560599",
      1   "sha256:2e983e03121d01381b8931ac29c8514b83d409019851fbe319a19dc973e37acc",
      1   "sha256:3aea6fab2dbf23d161e561e6afe0bf060d6c99dbdff31ff6d6dedf4e8e60949d",
      1   "sha256:4d799d4505540b9d5743a694d19b18e89dcfeb6266a7133863410d38e0a9f680"
      1   "sha256:5498e8c22f6996f25ef193ee58617d5b37e2a96decf22e72de13c3b34e147591",
      1   "sha256:54e647f81b1a7e902055e4791115a9fb602e72639b978a72a06aafe4eb4c8246",
      4   "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef",
      1   "sha256:6ac20142f853cd947ce4b982bee38eb26e03f0925bf58903199ab26d5f101937",
      1   "sha256:7a05d6e72510152995b5887fe51c8ac36779140abc9af34f09d706b2cb67e69a",
      1   "sha256:84b1719c52bd5b83ce59ffb55774b691a3fb565073398c7ac6ecc228e620bdb5",
      1   "sha256:89539473e1d5b49b8b537d6725feec8ddc903b4cbcfa235766e8f6825e2d6f4a",
      1   "sha256:9e1f1090a7f07923f33452bc24eea16c2d8cc08138f3db4e1d8e93a9e430dac7",
      1   "sha256:a0c97f620cd64a3cbb897fb87f9b1df454291253a29032fd8200e422c643e7fe",
      1   "sha256:a67506fbd03042aad7e8107256fd06d568343a9eebefe608f967f1ee95da27c5",
      1   "sha256:a6fd7d221e23ca963955a8d2a7b87b40cfe8f62b37bc3c79d2230aba780556ba",
      1   "sha256:b4bd02b17ce352d7dde2362003d87ddba47b587eebdbea7b7ce803800d37ed95",
      1   "sha256:babb0ac901f2a703fd049c1f409256a3acf9dcb47709676f2d56b13847ea6806",
      1   "sha256:d0a7470596635f4e06a524f182b51f6743b17555f44049c45132b9a5ce65c51f",
      1   "sha256:ea31cf21ba0208a998ee3bef804a79155864989972e3e489a6a657ec65dff316",
      1   "sha256:f80386fcd8ceddd5b8dc0823325847d348c62253303d47507e4c32ebe3e29cb2",
      1   "sha256:f8a1d3a7e2ee27b131917b3d5dc101f38f4d29e8aa79c5ab34287772e353ea5a",
      1   "sha256:f97056fec7f9e222a346f108ca493cef3d60e2f3624722a9d746708180a8e8cf",
      1 [
      1 ]

since it's just the one duplicate layer, duplicated 4 times, I was hoping it'd be easy to identify poking around docker history but nothing has jumped out to me yet.

ClaytonJY commented 3 months ago

Digging a little further here, this layer maps to file 54ef38418033d800a45a536f7c4f8d037549aa2c005f589e390961c0c5947149/layer.tar when saving and extracting this image. Extracting that file shows it's empty, so we must have 4 Dockerfile commands that don't alter the filesystem and thus result in empty images.

Not sure what's creating these empty layers, but I hope this helps!

ClaytonJY commented 3 months ago

Alright I pulled out dive and found the four offenders:

RUN |3 CUDA_VERSION=12.4.0.041 CUDA_DRIVER_VERSION=550.54.14 JETPACK_HOST_MOUNTS= /bin/sh -c if [ -n "${JETPACK_HOST_MOUNTS}" ]; then echo "/usr/lib/aarch64-linux-gnu/tegra" > /etc/ld.so.conf.d/nvidia-tegra.conf && echo "/usr/lib/aarch64-linux-gnu/tegra-egl" >> /etc/ld.so.conf.d/nvidia-tegra.conf;     fi # buildkit
RUN |2 TRITON_VERSION=2.44.0 TRITON_CONTAINER_VERSION=24.03 /bin/sh -c rm -fr /opt/tritonserver/* # buildkit
WORKDIR /opt
WORKDIR /opt/tritonserver

I'm going to assume the WORKDIR changes are false-positives, but looks to me that first RUN is a no-op because of the if statement, and the second is a no-op because opt/tritonserver/ is already empty.

Not sure the best way to solve this. Multi-stage dockerfile?

Per some GKE docs duplicate layers are not an issue with more recent GKE versions, but sure would be great to fix this issue for those of us still on earlier versions! Especially since GKE uses tritonserver as an example of when to use image streaming.

ClaytonJY commented 3 months ago

Yet another point of confusion is that the empty layers we have here (5f70bf18...) don't match the empty layer hash in the GKE docs (a3ed95ca...). But if these are truly empty layers, than being on a newer GKE/k8s version won't help, because empty layers will still prevent image streaming AFAICT.

nnshah1 commented 3 months ago

as a possible work around have you tried a tool like docker-squash?

https://github.com/goldmann/docker-squash

I did a quick sanity test locally and produces an image with a single layer (though looking at the docs the number of layers is somewhat configurable).

I don't have experience with the streaming feature on GKE - so an image with a single layer may not be desireable for other reasons but thought it worth a try - as we take a look at the request.

ClaytonJY commented 3 months ago

@nnshah1 I have not heard of docker-squash, looks interesting, thanks for the link and the reply!

nnshah1 commented 3 months ago

@ClaytonJY - if you can verify if the workaround will work - we can recommend this for versions of GKE that don't support empty layers. Since newer versions do support it - we'd probably consider this lower priority. Fair?

majidakbaridh commented 2 weeks ago

@nnshah1 Even newer versions of GKE can't support duplicated layers. They mentioned this in their documentation, but the feature is disabled, at least until GKE version 1.30. Another issue is that for docker-squash, the image layer IDs are required, but they are missing in Docker versions after 1.10.