svalinn / DAGMC

Direct Accelerated Geometry Monte Carlo Toolkit
https://svalinn.github.io/DAGMC
Other
96 stars 61 forks source link

Streamline dependencies of docker CI images #951

Closed ahnaf-tahmid-chowdhury closed 4 months ago

ahnaf-tahmid-chowdhury commented 5 months ago

Description

This PR aims to optimize the dependencies of the docker CI images by:

  1. Removing unused dependencies.
  2. Updating Geant4 to version 11.2.1.
  3. Changing the download method of HDF5 and Geant4 to use git clone from the source.
  4. Switching solely to git clone to download dependencies, removing the dependency on wget.
  5. Creating a new stage to store only binary files, thereby reducing the docker image size.

Impact

These changes are expected to streamline the dependencies of the docker CI images, reducing the image size and potentially improving performance.

Related Issues

Closes #750.

ahnaf-tahmid-chowdhury commented 4 months ago

Is it necessary to create an extra stage like HDF5 and MOAB, given that these are also external dependencies? I think we can remove these and instead update our workflow to create a binary/final stage to store only the binaries with minimal size, and we may choose the tag as dagmc.

gonuke commented 4 months ago

Is it necessary to create an extra stage like HDF5 and MOAB, given that these are also external dependencies? I think we can remove these and instead update our workflow to create a binary/final stage to store only the binaries with minimal size, and we may choose the tag as dagmc.

I think these stages mostly appear for historical reasons to limit the amount of rebuilding of images that must occur. The philosophy is that things that occur earlier in the Dockerfile change least often and least under our control. Thus, having them installed in early as a different stage can mean fewer instances of rebuilding those sections of the file. GEANT4 is the most resource intensive to build so one of the main things to avoid.

ahnaf-tahmid-chowdhury commented 4 months ago

I think these stages mostly appear for historical reasons to limit the amount of rebuilding of images that must occur. The philosophy is that things that occur earlier in the Dockerfile change least often and least under our control. Thus, having them installed in early as a different stage can mean fewer instances of rebuilding those sections of the file. GEANT4 is the most resource intensive to build so one of the main things to avoid.

I understand. We've included these stages to minimize the need for rebuilding HDF5 and MOAB whenever we make changes to the Dockerfile. For instance, if we modify the MOAB part, Docker will trace back to the HDF5 stage and then build MOAB. Currently, GitHub workflow supports a full chase log. This means that the steps where we make changes in the Dockerfile will be traced accordingly.