terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

Use a Multi stage Dockerfile to eliminate the need for our own base image #3518

Closed godber closed 8 months ago

godber commented 10 months ago

I think we should at least try to rejigger our Teraslice Dockerfile to try and eliminate the need for maintaining our own base image. I think the approach would be to use a Multistage Dockerfile (https://docs.docker.com/build/building/multi-stage/) whose first stage creates a temporary "dev image" (not pushed anywhere) that has all the node-gyp C dependencies and builds Teraslice. Then the second stage will build a smaller final image from the output of the dev image.

Roughly speaking:

I think we can still parameterize the Node version the same way we are now. Our main goal here is to isolate the changes as much as possible in the Dockerfile and making sure what we build passes e2e tests.

Ref:

When considering naming schemes for our Docker images in the issue below, I thought its probably worth trying out:

https://github.com/terascope/debian-base/issues/1#issuecomment-1883904854

godber commented 9 months ago

We attempted to roll out this change in the PR above in release:

https://github.com/terascope/teraslice/releases/tag/v0.91.0

But the resulting image showed significantly higher memory usage in large scale Kafka jobs so we rolled it back in this release:

https://github.com/terascope/teraslice/releases/tag/v0.92.0

For now this effort is on hold.