svalinn / DAGMC

Direct Accelerated Geometry Monte Carlo Toolkit
https://svalinn.github.io/DAGMC
Other
96 stars 63 forks source link

single dockerfile and single step in the docker building CI #863

Closed shimwell closed 1 year ago

shimwell commented 1 year ago

Description

This PR is an alternative to #822 In addition to moving the scripts into a single dockerfile this PR also changes the CI for publishing dockerfiles. Currently we had a branching and caching approach to the CI dockerfile building that is efficient but complex. This approach reduces the complexity of the CI and allows all the perturbations to run in parallel.

This might actually end up being quicker to build and quicker to fail when something is broken as the CI doesn't have breaks waiting for all the docker stages to reach the same stage to proceed to the next stage. Also if caching can be figured out then this workflow could make use of pre-built images to be very fast

Motivation and Context

simpler is nice for maintainers :smile:

Changes

refactored CI

Behavior

single dockerfile instead of scripts and simpler CI

shimwell commented 1 year ago

Action is running on my branch with 24 jobs in parallel https://github.com/shimwell/DAGMC/actions/runs/4315972934 Screenshot from 2023-03-02 16-13-40

shimwell commented 1 year ago

All jobs on my fork CI passed, it took 3 hours 28mins

shimwell commented 1 year ago

Had a go at all those comments and pushed, CI is running on my fork https://github.com/shimwell/DAGMC/actions/runs/4325712183

shimwell commented 1 year ago

CI on fork passed https://github.com/shimwell/DAGMC/actions/runs/4325712183

2nd stage only took a few seconds so the local stage cache appears to have worked.

This used the local stage cache instead of the container repo. But the layer has the same tag locally and I'm the container repo.

Perhaps this is not what we want

bquan0 commented 1 year ago

I just opened a PR to the branch this PR is on to replace the docker build actions with the multistage build action.

gonuke commented 1 year ago

@shimwell - this appears to need a rebase...

gonuke commented 1 year ago

This also failed here (https://github.com/shimwell/DAGMC/actions/runs/4752350398/jobs/8442595961) - apparently it failed to push to GHCR? and it was successful for @bquan0 here (https://github.com/bquan0/DAGMC/actions/runs/4749240986/jobs/8436860572) but was based on cached version??

bquan0 commented 1 year ago

I ran into that problem a few times on my workflow runs too and I mentioned it in the PR. I solved it by going to the settings of the package it was trying to push to, then checking the DAGMC repo under the "Manage Actions access" section.

shimwell commented 1 year ago

I have gone through each comment and done the changes need to resolve them.

I have not clicked the resolve button just yet as I want to see if the CI works over on my branch

I shall report back once the CI finishes

shimwell commented 1 year ago

ok I have done the easy ones :smile:

the compiler might require additional logic to get what the docker file needs (cc and cxx)

gonuke commented 1 year ago

We're nearly there!! Excited to see this finished. Hopefully we can look at the final things in my group's software meeting today

gonuke commented 1 year ago

There is a PR to this PR branch that should resolve everything

shimwell commented 1 year ago

I like the sound of that type of PR

shimwell commented 1 year ago

I really like the solution you found for setting compiler and getting two envs set

gonuke commented 1 year ago

I think this is the evidence that this is working!

I'm worried that I'm too complicit in the work to merge... @pshriwise or maybe @bam241 would enjoy the elegance of it all!

pshriwise commented 1 year ago

I really like the solution you found for setting compiler and getting two envs set

Agreed, that's a really nice solution. Better than others I've seen in our google searches.

gonuke commented 1 year ago

Another PR for some final cleanup

gonuke commented 1 year ago

Another PR for some final cleanup

Github actions for this PR are successful

shimwell commented 1 year ago

Thanks Paul, I've merged that in.

gonuke commented 1 year ago

Confirmation of success here

gonuke commented 1 year ago

Q: Which of the stages gets uploaded in the end?

Every stage that is explicitly referenced in a multistage-docker-build-action is pushed to the repo:

They each get pushed with the custom tag and then we convert push the dagmc stage with tags stable and latest as well, but only when running from the svalinn repo.

pshriwise commented 1 year ago

I'd love to approve and merge this, but there a runner isn't picking up the Mac testing job unfortunately.

gonuke commented 1 year ago

Saw that...☹️

gonuke commented 1 year ago

Github has turned off the MacOS 10.15 runners, so I made this PR to move us forward.

gonuke commented 1 year ago

This is still passing here: https://github.com/shimwell/DAGMC/actions/runs/4904019862

gonuke commented 1 year ago

Thanks for the teamwork @shimwell @bquan0 & @pshriwise !

shimwell commented 1 year ago

Delighted this one got in, a real team effort. Nice work all. What shall we do next 😁

pshriwise commented 1 year ago

😢 https://github.com/svalinn/DAGMC/actions/runs/4907989513/jobs/8764666815

shimwell commented 1 year ago

posting error message here so we don't lose it when the CI gets old

[ 55%] Building CXX object test/CMakeFiles/mbcn_test.dir/mbcn_test.cpp.o
Installing collected packages: pymoab
  Found existing installation: pymoab 5.4.1
    Can't uninstall 'pymoab'. No files were found to uninstall.
  Running setup.py develop for pymoab
    Complete output from command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/root/build_dir/moab/bld/pymoab/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps --prefix=/root/build_dir/moab/bld/pymoab:
    running develop
    error: can't create or remove files in install directory

    The following error occurred while trying to add or remove files in the
    installation directory:

        [Errno 2] No such file or directory: '/root/build_dir/moab/bld/pymoab/lib/python3.6/site-packages/test-easy-install-5760.write-test'

    The installation directory you specified (via --install-dir, --prefix, or
    the distutils default setting) was:

        /root/build_dir/moab/bld/pymoab/lib/python3.6/site-packages

    This directory does not currently exist.  Please create it and try again, or
    choose a different installation directory (using the -d or --install-dir
    option).

    ----------------------------------------
  Can't roll back pymoab; was not uninstalled
Command "/usr/bin/python3 -c "import setuptools, tokenize;__file__='/root/build_dir/moab/bld/pymoab/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" develop --no-deps --prefix=/root/build_dir/moab/bld/pymoab" failed with error code 1 in /root/build_dir/moab/bld/pymoab/
pymoab/CMakeFiles/pymoab-local-install.dir/build.make:58: recipe for target 'pymoab/CMakeFiles/pymoab-local-install' failed
make[2]: *** [pymoab/CMakeFiles/pymoab-local-install] Error 1
CMakeFiles/Makefile2:1238: recipe for target 'pymoab/CMakeFiles/pymoab-local-install.dir/all' failed
make[1]: *** [pymoab/CMakeFiles/pymoab-local-install.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 55%] Linking CXX executable ../bin/mbcn_test
[ 55%] Built target mbcn_test
make: *** [all] Error 2
Makefile:140: recipe for target 'all' failed
Error: Process completed with exit code 2.
shimwell commented 1 year ago
e08954f49424: Pull complete
Digest: sha256:3c332407c190c017dbe049e8e9c9d54c5e44166b484c56fd414821bc54e5c233
Status: Downloaded newer image for akhilerm/repo-copy:latest
crane [copy ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0/dagmc:refs_heads_develop-bk0 ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0:stable]
2023/05/07 17:46:34 Copying from ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0/dagmc:refs_heads_develop-bk0 to ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0:stable
Error: fetching "ghcr.io/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0/dagmc:refs_heads_develop-bk0": GET https://ghcr.io/v2/svalinn/dagmc-ci-ubuntu-22.04-gcc-ext-hdf5_1.10.4-moab_5.3.0/dagmc/manifests/refs_heads_develop-bk0: MANIFEST_UNKNOWN: manifest unknown
panic: exit status 1

goroutine 1 [running]:
main.main()
    /go/src/github.com/akhilerm/repo-copy/main.go:20 +0x15a
Error: The process '/usr/bin/docker' failed with exit code 2
gonuke commented 1 year ago

Yep - working on that, too ☹️