Reduce total downloads and image sizes to a fraction by using intermediate image(s)

E.g. "ChatQnA" service is based on 9 microservices, of which several use images with the same large python libraries (like PyTorch) duplicated on them.

By building/using intermediate image containing the shared dependencies:

Large dependencies are downloaded / built only once during image builds
Rest of the images are small as those deps are on shared layer, instead bloating sizes of all the images
Smaller images mean first run of the given containers (on given host) being much faster

I.e. it will take significantly less bandwidth, disk space and time, which all are cost reductions on rented clusters.

As an example, here's documentation on how Kubeflow images are layered: https://www.kubeflow.org/docs/components/notebooks/container-images/

(Once this ticket has been fixed, similar graph for OPEA image layers would nice.)

opea-project / GenAIComps

Reduce total downloads and image sizes to a fraction by using intermediate image(s) #265