Improve cachebility of generated Dockerfiles

ufoym / deepo

Setup and customize deep learning environment in seconds.

http://ufoym.com/deepo

MIT License

6.32k stars 749 forks source link

Improve cachebility of generated Dockerfiles #55

Closed kklemon closed 2 years ago

kklemon commented 6 years ago

Is there a reason why all the modules and other statements that are generated by the generator.py script are concatenated and listed as a single RUN statement in the final Dockerfile?

Especially for large images with long and error prone build processes this results in that caching can't be used at all. I already had many cases where I had to try different combinations of packages/libraries or implement new ones by myself and it always cost a hell of a time to have to rebuild the image all over again when a slight change around the end of Dockerfile had to be made.

What about changing the generator in such a way, that each module does not generate the final string to be added to the Dockerfile directly but just returns a list of strings, each being listed as a single RUN statement in the final Dockerfile?

ufoym commented 6 years ago

We understand your frustration and apologize for any inconvenience. Multiple RUNs, as you said, would make the Dockerfile much more readable and easier to be debugged.

However, we choose to merge all commands into a single RUN, as each RUN line adds a layer to the image. If you delete a file that was created in a different layer, all the union filesystem does is register the filesystem change in a new layer, the file still exists in the previous layer and is shipped over the networked and stored on disk. So if you download the source code, extract it, compile it into a binary, and then delete the tgz and source files at the end, you really want this all done in a single layer to reduce image size.

kklemon commented 6 years ago

Thank you for the clear explanation. Now I understand why it is handled this way and even think that it is legimated.

But I still wonder if this feature couldn't be added as a command line option. It should be relatively easy to change the current implementation to make this work. As described before, each module could just return a list of single commands and the composer would then decide if those strings are either concatenated and executed with a single RUN or each separetely.

May I provide a PR for this so you could tell me if you think this is acceptable or not?

ufoym commented 6 years ago

May I provide a PR for this so you could tell me if you think this is acceptable or not?

Sure! We appreciate your contribution.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 180 days with no activity.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 30 days since being marked as stale.