stackabletech / docker-images

Apache License 2.0
17 stars 2 forks source link

Adds variable to the name of build caches to enable parallel builds of multiple versions #779

Closed soenkeliebau closed 1 month ago

soenkeliebau commented 1 month ago

Description

We encountered errors when building multible product versions at the same time with for example

bake -p omid -i 0.0.0-dev

This builds 1.1.0, 1.1.1 and 1.1.2 in parallel, all of which shared the cache volume with the name "maven".

When attempting to delete the cache there was a race condititon that caused the rm for the first container to finish to fail because after the rm the directory was not empty - other containers were still using it.

Error log:

70.37 [INFO] Omid ............................................... SUCCESS [  7.297 s]
70.37 [INFO] Common ............................................. SUCCESS [  8.764 s]
70.37 [INFO] State Machine ...................................... SUCCESS [  0.450 s]
70.37 [INFO] Commit Table ....................................... SUCCESS [  0.067 s]
70.37 [INFO] Metrics ............................................ SUCCESS [  1.326 s]
70.37 [INFO] Transaction Client ................................. SUCCESS [  0.265 s]
70.37 [INFO] HBase Common ....................................... SUCCESS [ 23.827 s]
70.37 [INFO] HBase Commit Table ................................. SUCCESS [  2.144 s]
70.37 [INFO] Codahale Metrics ................................... SUCCESS [  0.057 s]
70.37 [INFO] Benchmarks ......................................... SUCCESS [  2.860 s]
70.37 [INFO] Timestamp Storage .................................. SUCCESS [  0.345 s]
70.37 [INFO] HBase tools ........................................ SUCCESS [  0.293 s]
70.37 [INFO] TSO and TO Servers ................................. SUCCESS [  3.955 s]
70.37 [INFO] HBase Client ....................................... SUCCESS [  0.557 s]
70.37 [INFO] HBase Coprocessors ................................. SUCCESS [  1.705 s]
70.37 [INFO] Omid Client Examples ............................... SUCCESS [  1.697 s]
70.37 [INFO] ------------------------------------------------------------------------
70.37 [INFO] BUILD SUCCESS
70.37 [INFO] ------------------------------------------------------------------------
70.37 [INFO] Total time:  01:09 min
70.37 [INFO] Finished at: 2024-07-17T11:07:52Z
70.37 [INFO] ------------------------------------------------------------------------
70.46 + tar -xf tso-server/target/omid-tso-server-1.1.1-bin.tar.gz -C /stackable
70.71 + tar -xf examples/target/omid-examples-1.1.1-bin.tar.gz -C /stackable
70.97 + '[' true = true ']'
70.97 + rm -rf /stackable/.m2/repository/antlr /stackable/.m2/repository/aopalliance /stackable/.m2/repository/asm /stackable/.m2/repository/avalon-framework /stackable/.m2/repository/backport-util-concurrent /stackable/.m2/repository/biz /stackable/.m2/repository/ch /stackable/.m2/repository/classworlds /stackable/.m2/repository/com /stackable/.m2/repository/commons-beanutils /stackable/.m2/repository/commons-chain /stackable/.m2/repository/commons-cli /stackable/.m2/repository/commons-codec /stackable/.m2/repository/commons-collections /stackable/.m2/repository/commons-configuration /stackable/.m2/repository/commons-daemon /stackable/.m2/repository/commons-digester /stackable/.m2/repository/commons-io /stackable/.m2/repository/commons-lang /stackable/.m2/repository/commons-logging /stackable/.m2/repository/commons-net /stackable/.m2/repository/commons-validator /stackable/.m2/repository/de /stackable/.m2/repository/dnsjava /stackable/.m2/repository/dom4j /stackable/.m2/repository/io /stackable/.m2/repository/jakarta /stackable/.m2/repository/javax /stackable/.m2/repository/jline /stackable/.m2/repository/joda-time /stackable/.m2/repository/junit /stackable/.m2/repository/kr /stackable/.m2/repository/log4j /stackable/.m2/repository/logkit /stackable/.m2/repository/net /stackable/.m2/repository/org /stackable/.m2/repository/oro /stackable/.m2/repository/sslext /stackable/.m2/repository/xml-apis /stackable/.m2/repository/xmlenc /stackable/.m2/repository/xmlpull /stackable/.m2/repository/xpp3
71.17 rm: cannot remove '/stackable/.m2/repository/org': Directory not empty
------
Dockerfile:17
--------------------
  16 |     # hadolint ignore=DL3003
  17 | >>> RUN --mount=type=cache,id=omid,uid=1000,target=/stackable/.m2/repository <<EOF
  18 | >>>   set -x
  19 | >>>   curl --fail -L https://repo.stackable.tech/repository/packages/omid/phoenix-omid-${PRODUCT}-src.tar.gz | tar -xzC .
  20 | >>>   cd /stackable/phoenix-omid-${PRODUCT} || exit
  21 | >>>   mvn --batch-mode --no-transfer-progress package -Phbase-2 -DskipTests
  22 | >>>   tar -xf tso-server/target/omid-tso-server-${PRODUCT}-bin.tar.gz -C /stackable
  23 | >>>   tar -xf examples/target/omid-examples-${PRODUCT}-bin.tar.gz -C /stackable
  24 | >>>
  25 | >>> if [ "${DELETE_CACHES}" = "true" ] ; then
  26 | >>>   rm -rf /stackable/.m2/repository/*
  27 | >>> fi
  28 | >>> EOF
  29 |
--------------------
ERROR: failed to solve: process "/bin/bash -euo pipefail -c   set -x\n  curl --fail -L https://repo.stackable.tech/repository/packages/omid/phoenix-omid-${PRODUCT}-src.tar.gz | tar -xzC .\n  cd /stackable/phoenix-omid-${PRODUCT} || exit\n  mvn --batch-mode --no-transfer-progress package -Phbase-2 -DskipTests\n  tar -xf tso-server/target/omid-tso-server-${PRODUCT}-bin.tar.gz -C /stackable\n  tar -xf examples/target/omid-examples-${PRODUCT}-bin.tar.gz -C /stackable\n\nif [ \"${DELETE_CACHES}\" = \"true\" ] ; then\n  rm -rf /stackable/.m2/repository/*\nfi\n" did not complete successfully: exit code: 1
Traceback (most recent call last):
  File "/home/sliebau/IdeaProjects/stackable/docker-images/.venv/bin/bake", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/sliebau/IdeaProjects/stackable/docker-images/.venv/lib/python3.11/site-packages/image_tools/bake.py", line 231, in main
    result = run(cmd.args, input=cmd.input, check=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/current-system/sw/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['docker', 'buildx', 'bake', '--file', '-', 'omid-1_1_0', 'omid-1_1_1', 'omid-1_1_2', '--load']' returned non-zero exit status 1.

To fix this we include the product version in the name of the cache volume, which causes every builder to get their own volume.

We might need to revisit this, if we ever want to build the same product version with different other parameters (java version comes to mind). But our toolchain cannot handle this anyway at the moment, so that is a worry for later.

Definition of Done Checklist

- [ ] Changes are OpenShift compatible
- [ ] All added packages (via microdnf or otherwise) have a comment on why they are added
- [ ] Things not downloaded from Red Hat repositories should be mirrored in the Stackable repository and downloaded from there
- [ ] All packages should have (if available) signatures/hashes verified
- [ ] Add an entry to the CHANGELOG.md file
- [ ] Integration tests ran successfully
TIP: Running integration tests with a new product image The image can be built and uploaded to the kind cluster with the following commands: ```shell bake --product --image-version kind load docker-image --name= ``` See the output of `bake` to retrieve the image tag for ``.
soenkeliebau commented 1 month ago

image