Multiple Dependency Installs in GHA

pSchlarb commented 3 years ago

Noticed, that when slicing the modules for the tests each slice is installing the pip dependencies. Would it be beneficial to include the dependency install into the Dockerfile? For example add the following pip command from .github/workflows/build.yaml:232 to the .github/workflows/build/Dockerfile?

        run: |
          # Explicitly use the existing pip cache location in the node-build image.
          pip --cache-dir /root/.cache/pip install .[tests]

WadeBarnes commented 3 years ago

I played with various ways of caching the dependency install when I was originally working on that part of the workflow. The issue I ran into with moving it to the Dockerfile was the specific image then becomes the dependency and any given workflow run needs to use the image it built throughout the entire workflow, as at any time someone could check-in a new version of code that changes the installed dependencies. The resulting image is only relevant to that code and that workflow, and may not work with previous or future workflow runs containing other code. At the time I could not come up with a clean solution so went back to having the tests install the dependencies on their own. The pip packages are cached, which helps to speed things up a small amount, but I've run into situations where even caching the pip packages causes problems with installing the dependencies.

I agree that there is a great deal of time waisted installing the dependencies before each test run, I just don't know the best way to address it at the moment.

swcurran commented 3 years ago

Would there be a way to use a tag on the image, or even derive the image name (e.g., from the PR #) so that the test instances that need the one image could find it, and the next run would use a different name?

WadeBarnes commented 3 years ago

The concern there is filling up the container registry with a bunch of onetime use container images. I started going down that path at one point until I realized the ramifications (the mess getting left behind).

swcurran commented 3 years ago

I was thinking that. Could there be an "end-of-tests" task that knows when all the tests have run and deletes the image from the registry?

WadeBarnes commented 3 years ago

Perhaps. I was also playing with generating the image internally and passing it around to the different jobs, but that did not work because of how the image is being used by the jobs.

pSchlarb commented 3 years ago

As i understand the actual workflow, we are already creating and caching image for each run or am i missing something? Would enabling Caching on the docker layers as described here make sense? Since the baseinstall with the apt-dependencies would not change often, compared to the pip dependency install?

WadeBarnes commented 3 years ago

Currently the workflow only builds and published the image if the associated docker file changes, https://github.com/hyperledger/indy-plenum/blob/master/.github/workflows/build.yaml#L33. Otherwise it reuses the previously built container that was published in the container registry.

Layer caching may help if we use the approach of building a tagged container image for each run that gets deleted after the workflow completes as discussed with @swcurran.

pSchlarb commented 2 years ago

Tests have shown, that a specific run container with all the needed dependencies doesn't bring an overall improvement in pipeline execution time. Will therefore close this issue.

hyperledger / indy-node

Multiple Dependency Installs in GHA #1693