green-code-initiative / ecoCode-challenge

Emboard in the hackhatons serie for improving ecoCode
3 stars 4 forks source link

[Hackathon 2024][Gadolinium][Docker] Don’t include unnecessary files in image #110

Open Neobody-0 opened 4 months ago

Neobody-0 commented 4 months ago

Rule title

Don’t include unnecessary files in image

Language and platform

Docker

Rule description

Eliminating unnecessary files will naturally decrease the image size. A smaller image shortens build time and lowers storage costs, contributing to energy savings and environmentally friendly coding practices.

One way to avoid unnecessary files is to use a .dockerignore file.

When executing a build command, the build client searches for a .dockerignore file in the context's root directory. Should this file be present, it excludes files and directories matching the patterns specified within from the build context prior to sending it to the builder. https://docs.docker.com/build/building/context/#dockerignore-files

Rule short description

Exclude files not relevant to the build, without restructuring your source repository. https://docs.docker.com/develop/develop-images/guidelines/

Rule justification

In this article as an example : Ten simple rules for writing Dockerfiles for reproducible data science - PMC (nih.gov), sometimes we could introduce data not useful for final image (data, temporary files, dependencies, etc), as mentioned in the article “Storing data files outside of the container allows handling of very large or sensitive datasets, e.g., proprietary data or private information. Do not include such data in an image! To avoid publishing sensitive data by accident, you can add the data directory to the .dockerignore file, which excludes files and directories from the build context, i.e., the set of files considered by docker build. Ignoring data files also speeds up the build in cases where there are very large files or many small files.

Why it matters:

Severity / Remediation Cost

Severity : Major, some files could be huge, for example node_modules can have up to 400Mo (like 20x the size of an alpine image) .

Remediation cost : Easy, users need to add or complete the docker ignore file.

Implementation principle

The feasibility of the implementation hinges on the ability to scan the .dockerignore file with SonarQube. If this is achievable, we can verify its presence and possibly employ a template (similar to .gitignore) to enumerate all the files that should be omitted.

An enhancement to this rule, though potentially challenging to implement, would be to examine the base image in the Dockerfile to identify the technology and apply a corresponding template for the .dockerignore file.