networkupstools / jenkins-dynamatrix

A shared library to do a sort of matrix build based on available swarm agent labels
Other
2 stars 1 forks source link

Something wrong with NUT fightwarn #24

Open jimklimov opened 1 year ago

jimklimov commented 1 year ago

Not seen in other builds, but with the last "properly" behaving NUT fightwarn build being https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/73/ in July 2023, the subsequent https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/74/ (and https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/ soon after) in September fail due to what seems to be not-resolving the MAKE variable in many build scenarios - on the same build hosts as the master/PR builds use, and this happens not in all parallel branches, e.g.:

First running a quiet parallel build...
/home/abuild/jenkins-nutci-centos-7-amd64/workspace/nut_nut_fightwarn@tmp/durable-e6cf4883/script.sh: line 7: -s: command not found

real    0m0.000s
user    0m0.000s
sys 0m0.000s
First attempt failed (127), retrying to log what did:
time: invalid option -- 'k'
Usage: time [-apvV] [-f format] [-o file] [--append] [--verbose]
       [--portability] [--format=format] [--output=file] [--version]
       [--help] command [arg...]

or

First running a quiet parallel build...
time: cannot run VERBOSE=0: No such file or directory
Command exited with non-zero status 127
0.00user 0.00system 0:00.00elapsed 77%CPU (0avgtext+0avgdata 964maxresident)k
0inputs+0outputs (0major+35minor)pagefaults 0swaps
First attempt failed (127), retrying to log what did:
time: invalid option -- 'k'
Try 'time --help' for more information.

ARCH64='x86_64' ARCH_BITS='64' BITS='64' BRANCH_NAME='fightwarn' BUILD_DISPLAY_NAME='#75' BUILD_ID='75' BUILD_NUMBER='75' BUILD_TAG='jenkins-nut-nut-fightwarn-75' BUILD_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/' CFLAGS='-Wall' CI='true' CI_SLOW_BUILD_FILTERNAME='Default autotools driven build with max warnings and varied C/C++ revisions (allowed to fail)' CI_WRAP_SH='ssh -o SendEnv='"'"'*'"'"' "jenkins-ubuntu2110-amd64" /bin/sh -xe ' CLANGVER='13' COMPILER='CLANG' CSTDVARIANT='gnu' CSTDVERSION_c='11' CSTDVERSION_cxx='11' CXXFLAGS='-Wall' DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/399/bus' EXECUTOR_NUMBER='1' GIT_AUTHOR_DATE='2023-09-19 23:11:05 +00:00' GIT_COMMITTER_DATE='2023-09-19 23:11:05 +00:00' HOME='/home/abuild' HUDSON_HOME='/var/lib/jenkins/home' HUDSON_URL='https://ci.networkupstools.org/' IFS='
JENKINS_HOME='/var/lib/jenkins/home' JENKINS_URL='https://ci.networkupstools.org/' JOB_BASE_NAME='fightwarn' JOB_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/display/redirect' JOB_NAME='nut/nut/fightwarn' JOB_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/' LANG='C' LC_ALL='C' LOGNAME='abuild' MATRIX_TAG='gnu11-gnu++11-clang-13-ubuntu-impish-x86_64-64bit' MOTD_SHOWN='pam' NODE_LABELS='ARCH64=x86_64 ARCH_BITS=64 CLANGVER=13 COMPILER=CLANG COMPILER=GCC DYNAMATRIX_REFREPO_WORKSPACE_LOCKNAME=gitcache-dynamatrix:ci-debian DYNAMATRIX_UNSTASH_PREFERENCE=scm-ws:nut-ci-src GCCVER=11 MAKE=make NUT_BUILD_CAPS=cppcheck NUT_BUILD_CAPS=cppunit NUT_BUILD_CAPS=drivers:DMF=yes NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=nutconf=yes OS_DISTRO=ubuntu-impish OS_FAMILY=linux PYTHON=python2.7 PYTHON=python3.9 SHELL_PROGS=bash SHELL_PROGS=busybox SHELL_PROGS=csh SHELL_PROGS=dash SHELL_PROGS=ksh93 SHELL_PROGS=sh SHELL_PROGS=tcsh SHELL_PROGS=zsh ci-debian-altroot--jenkins-ubuntu2110-amd64+ssh nut-builder nut-builder:DMF nut-builder:alldrv' NODE_NAME='ci-debian-altroot--jenkins-ubuntu2110-amd64+ssh' OLDPWD='/home/abuild' OPTIND='1' OS_DISTRO='ubuntu-impish' OS_FAMILY='linux' PARMAKE_LA_LIMIT='8' PATH='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin' PPID='822749' PS1='$ ' PS2='> ' PS4='+ ' PWD='/srv/libvirt/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn' RUN_ARTIFACTS_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=artifacts' RUN_CHANGES_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=changes' RUN_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect' RUN_TESTS_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=tests' SHELL='/bin/bash' SHLVL='0' SSH_CLIENT='10.0.3.1 38436 22' SSH_CONNECTION='10.0.3.1 38436 10.0.3.122 22' STAGE_NAME='Prep' TZ='UTC' USER='abuild' WORKSPACE='/home/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn' WORKSPACE_TMP='/home/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn@tmp' XDG_RUNTIME_DIR='/run/user/399' XDG_SESSION_CLASS='user' XDG_SESSION_ID='193406' XDG_SESSIONTYPE='tty' ='/bin/sh'


* https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75//artifact/.ci.MD5_f7bc8232f878d178067df0490d730b71.origEnvvars.log.gz:

Actual original envvars for build scenario described as: Building with CLANG-13 STD=gnu11 STD=gnu++11 on x86_64 64-bit linux-ubuntu-impish platform for MATRIX_TAG="gnu11-gnu++11-clang-13-ubuntu-impish-x86_64-64bit" && (ARCH_BITS=64&&ARCH64=x86_64&&COMPILER=CLANG&&CLANGVER=13&&OS_DISTRO=ubuntu-impish&&OS_FAMILY=linux) && (nut-builder) && BITS=64&&CSTDVARIANT=gnu&&CSTDVERSION_c=11&&CSTDVERSION_cxx=11 && LANG=C && LC_ALL=C && TZ=UTC && CFLAGS=-Wall && CXXFLAGS=-Wall :: as part of slowBuild filter: Default autotools driven build with max warnings and varied C/C++ revisions (allowed to fail)

ARCH64='x86_64' ARCH_BITS='64' BITS='64' BRANCH_NAME='fightwarn' BUILD_DISPLAY_NAME='#75' BUILD_ID='75' BUILD_NUMBER='75' BUILD_TAG='jenkins-nut-nut-fightwarn-75' BUILD_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/' CFLAGS='-Wall' CI='true' CI_SLOW_BUILD_FILTERNAME='Default autotools driven build with max warnings and varied C/C++ revisions (allowed to fail)' CI_WRAP_SH='ssh -o SendEnv='"'"'*'"'"' "jenkins-ubuntu2110-amd64" /bin/sh -xe ' CLANGVER='13' COMPILER='CLANG' CSTDVARIANT='gnu' CSTDVERSION_c='11' CSTDVERSION_cxx='11' CXXFLAGS='-Wall' DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/399/bus' EXECUTOR_NUMBER='1' GIT_AUTHOR_DATE='2023-09-19 23:11:05 +00:00' GIT_COMMITTER_DATE='2023-09-19 23:11:05 +00:00' HOME='/home/abuild' HUDSON_HOME='/var/lib/jenkins/home' HUDSON_URL='https://ci.networkupstools.org/' IFS='
JENKINS_HOME='/var/lib/jenkins/home' JENKINS_URL='https://ci.networkupstools.org/' JOB_BASE_NAME='fightwarn' JOB_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/display/redirect' JOB_NAME='nut/nut/fightwarn' JOB_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/' LANG='C' LC_ALL='C' LOGNAME='abuild' MATRIX_TAG='gnu11-gnu++11-clang-13-ubuntu-impish-x86_64-64bit' MOTD_SHOWN='pam' NODE_LABELS='ARCH64=x86_64 ARCH_BITS=64 CLANGVER=13 COMPILER=CLANG COMPILER=GCC DYNAMATRIX_REFREPO_WORKSPACE_LOCKNAME=gitcache-dynamatrix:ci-debian DYNAMATRIX_UNSTASH_PREFERENCE=scm-ws:nut-ci-src GCCVER=11 MAKE=make NUT_BUILD_CAPS=cppcheck NUT_BUILD_CAPS=cppunit NUT_BUILD_CAPS=drivers:DMF=yes NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=nutconf=yes OS_DISTRO=ubuntu-impish OS_FAMILY=linux PYTHON=python2.7 PYTHON=python3.9 SHELL_PROGS=bash SHELL_PROGS=busybox SHELL_PROGS=csh SHELL_PROGS=dash SHELL_PROGS=ksh93 SHELL_PROGS=sh SHELL_PROGS=tcsh SHELL_PROGS=zsh ci-debian-altroot--jenkins-ubuntu2110-amd64+ssh nut-builder nut-builder:DMF nut-builder:alldrv' NODE_NAME='ci-debian-altroot--jenkins-ubuntu2110-amd64+ssh' OLDPWD='/home/abuild' OPTIND='1' OS_DISTRO='ubuntu-impish' OS_FAMILY='linux' PARMAKE_LA_LIMIT='8' PATH='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin' PPID='822718' PS1='$ ' PS2='> ' PS4='+ ' PWD='/srv/libvirt/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn' RUN_ARTIFACTS_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=artifacts' RUN_CHANGES_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=changes' RUN_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect' RUN_TESTS_DISPLAY_URL='https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/75/display/redirect?page=tests' SHELL='/bin/bash' SHLVL='0' SSH_CLIENT='10.0.3.1 38434 22' SSH_CONNECTION='10.0.3.1 38434 10.0.3.122 22' STAGE_NAME='Prep' TZ='UTC' USER='abuild' WORKSPACE='/home/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn' WORKSPACE_TMP='/home/abuild/jenkins-nut-altroots/jenkins-ubuntu2110-amd64+ssh/workspace/nut_nut_fightwarn@tmp' XDG_RUNTIME_DIR='/run/user/399' XDG_SESSION_CLASS='user' XDG_SESSION_ID='193404' XDG_SESSIONTYPE='tty' ='/bin/sh'


Notably no `MAKE=...` is provided here, so a fallback to `make` should have happened. Maybe this is linked with the recent effort to untangle parallel closure creations (using `def` and clones everywhere, to avoid changing of same values from different logic branches as was seen earlier with mix-ups of stage name groovy strings vs. contents of envvars passed to them; possibly some move from GStrings to be resolved immediately as Strings was not completed?..)

So far nothing apparently toxic was found in NUT `Jenkinsfile-dynamatrix` (nor `ci_build.sh`) changes between these builds.

* 73:

Revision: 37befb64cf2c1050ee52e953b31f66131b3cdf50 Repository: https://github.com/networkupstools/jenkins-dynamatrix.git

Revision: 91396d05b72e0b97bf8a1f6a71212d3161c6340d Repository: https://github.com/networkupstools/nut.git


* 75:

Revision: 0d3add30edf6403f92188a750bb8827a3aaa237d Repository: https://github.com/networkupstools/jenkins-dynamatrix.git

Revision: cb5e92cccdb30c10d11546a9a4bb92ca28831b9f Repository: https://github.com/networkupstools/nut.git



Numbers were roughly equal:
* 74: `Not all went well: countStagesStarted:350 countStagesCompleted:350 countStagesFinishedOK:250 countStagesFinishedFailure:100`
* 75: `Not all went well: countStagesStarted:350 countStagesCompleted:350 countStagesFinishedOK:251 countStagesFinishedFailure:99`
jimklimov commented 1 year ago

Testing a theory that either we did pass MAKE envvar from pipelines to build scripts (specifically ci_build.sh) before and do not do so now, or now we pass an empty value when not specified by a build scenario/matrix case, and ultimately the build logic is confused...

jimklimov commented 1 year ago

At least, the message is directly related to the library:

$ git grep 'First running a quiet parallel build'

vars/autotools.groovy:            dynacfgPipeline.buildPhases['buildQuiet'] = """( echo "First running a quiet parallel build..." >&2; eval time \${MAKE} \${MAKE_OPTS} VERBOSE=0 V=0 -s -k -j 4 all >/dev/null && echo "SUCCESS" && exit 0; echo "First attempt failed (\$?), retrying to log what did:"; eval time \${MAKE} \${MAKE_OPTS} -k all )"""

vars/autotools.groovy:            dynacfgPipeline.buildPhases['buildQuietCautious'] = """( echo "First running a quiet parallel build..." >&2; eval time \${MAKE} \${MAKE_OPTS} VERBOSE=0 V=0 -s -k -j 4 all >/dev/null && echo "Seemingly a SUCCESS" ; echo "First attempt finished (\$?), retrying to log what fails (if any):"; eval time \${MAKE} \${MAKE_OPTS} -k all )"""

No hits in NUT for the log message, and got shell envvar expansions of MAKE here...

jimklimov commented 1 year ago

One idea that belongs here (testing now) is that originally we initialized a default dynacfgPipeline.defaultTools.MAKE if the defaultTools was missing. Maybe it is pre-populated better now, and we should only create the map if missing, and separately the MAKE entry if missing in the map

jimklimov commented 1 year ago

Still at it :( Perhaps the map is not always consulted (or dedicated instance/clone passed?) when expanding the buildPhases at run-time?.. Or is deleted at a later time from the pipeline preparation logic?..

jimklimov commented 1 year ago

After some attempts to rectify, in essence, the symptoms (e.g. fit a MAKE definition into dynacfg* maps more correctly), I found that other variables were no longer handled well (e.g. CC and CXX which are prepared from CLANGVER and GCCVER etc. by a configureEnvvars scriptlet which apparently no longer got called either => all builds went with default gcc usually), I think I came upon the root cause: summer's refactoring of the library, which among other things added protections against overwriting the original maps that are input into the sanityCheck*() and some other methods.

Groovy allows to manipulate original map contents directly (input variable names are references to those maps), so to isolate what happens in the method while keeping an original intact, a clone is made early on, and that clone is returned from the method for caller to assign wherever they want. Nothing can go wrong, and caller's data objects are safe, right?..

In practice, with the closures using a delegation mechanism (to resolve variables from caller context), we end up setting this.script into the delegation aroungd generateBuild() method, and probably it is the higher-priority carrier of a dynacfgPipeline name (a map prepared by a Jenkinsfile-dynamatrix and later adjusted by dynamatrixPipeline.groovy). Closures defined in that dynacfgPipeline which manage matrix cell builds refer to further data from dynamatrixPipeline.somefields - and apparently end up looking into the original map in the script, which remains barely initialized after sanityCheck*() methods decouple it from the map object being actually manipulated.

In other words, success of the groovy script currently relied on all data ending up in the Jenkinsfile's singleton of the map. Reverting with https://github.com/networkupstools/jenkins-dynamatrix/commit/496500d73439bc57552a3cc2d946f855521b03b8 the clone() operations seems to have fixed the issue, at least compiler names are getting resolved again as of build https://ci.networkupstools.org/job/nut/job/nut/job/fightwarn/106/

The "correct" solution would be however to ensure that the build matrix cells get their separate copies of the dynacfg* into each of their contexts, to avoid surprises like independent manipulation of same information there by different code-paths that assume personal sandboxes.

With current implementation and new knowledge, this seems complicated by a few points:

jimklimov commented 1 year ago

In Dynamatrix.generateBuild() early on we prepare (hydrate etc.) the body closure delegation context maps. Probably a dynacfgPipeline could be defined there; ideally (parameter? closure? documented magic word? several aliases?) also named so that it resolves back from inside its prepared closure field values. It may help to also add a named copy to DSBC class for reference (e.g. github notifications that benefit from a stash id) - more so for possibility of matrices made from several sources so there is no single dynacfgOrig to help out).