idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.71k stars 1.04k forks source link

Wasp install doesn't appear to be playing ball #26308

Open herter4171-kp opened 9 months ago

herter4171-kp commented 9 months ago

Bug Description

We get build errors chastizing about ensuring a valid install of Wasp exists, and Dockerfile does set WASP_DIR and has syntax to build and install. This directory does not exist in the latest official image and the earlier rev we were targeting.

In Dockerfile, I would advise changing the RUN for Wasp to start with set -e to exit poorly on first failure. I've been burned a number of times where the image build exits zero after miles of log, so things can fly under the radar.

Steps to Reproduce

  1. docker run idaholab/moose:latest /bin/bash -c 'echo $WASP_DIR; ls $WASP_DIR'

Impact

We aren't able to build SAM, so it's a bit of a problem.

lindsayad commented 9 months ago

@loganharbour @brandonlangley is this something one of you could help with?

brandonlangley commented 9 months ago

@loganharbour -

I've never used Docker before so I don't know the first thing about it, but I gave this a shot to see what I could find.

I found these commands in the WASP section of docker_ci/Dockerfile:

ARG WASP_REV=28fe0d67f10693eeb8c5c4d151af03d573962155
ENV WASP_DIR=/usr/local/wasp

COPY scripts/update_and_rebuild_wasp.sh ${MOOSE_DIR}/scripts/update_and_rebuild_wasp.sh
COPY scripts/configure_wasp.sh ${MOOSE_DIR}/scripts/configure_wasp.sh

RUN mkdir -p framework/contrib ; \
cd framework/contrib ; \
git clone https://code.ornl.gov/neams-workbench/wasp.git ; \
cd wasp ; \
git checkout ${WASP_REV} ; \
git submodule update --init --recursive ; \
cd ../../.. ; \
./scripts/update_and_rebuild_wasp.sh -D CMAKE_INSTALL_PREFIX:STRING=${WASP_DIR}

And I transcribed them into these regular Bash commands and ran them manually on my Mac:

export WASP_REV=28fe0d67f10693eeb8c5c4d151af03d573962155
export WASP_DIR=/path/to/some/existing/empty/directory

mkdir scripts
cp ~/projects/moose/scripts/update_and_rebuild_wasp.sh scripts/update_and_rebuild_wasp.sh
cp ~/projects/moose/scripts/configure_wasp.sh scripts/configure_wasp.sh

mkdir -p framework/contrib
cd framework/contrib
git clone https://code.ornl.gov/neams-workbench/wasp.git
cd wasp
git checkout ${WASP_REV}
git submodule update --init --recursive
cd ../../..
./scripts/update_and_rebuild_wasp.sh -D CMAKE_INSTALL_PREFIX:STRING=${WASP_DIR}

When running the commands in that order manually, the ./scripts/update_and_rebuild_wasp.sh step failed with:

fatal: not a git repository (or any of the parent directories): .git

After adding set -ex to scripts/update_and_rebuild_wasp.sh, I found that this line was failing in that script:

git submodule update --init --recursive /path/to/working/directory/scripts/../framework/contrib/wasp

So I ran the above line manually from the base of my working directory that had this minimal structure:

framework/
framework/contrib/
framework/contrib/wasp/
framework/contrib/wasp/*
scripts/
scripts/configure_wasp.sh
scripts/update_and_rebuild_wasp.sh

And this failed with that same error since the base of my minimal test directory was not a git repository.


I then tried that process of running the commands from the libMesh and PETSc sections of docker_ci/Dockerfile.

Surprisingly, the commands for both libMesh and PETSc worked fine when transcribed for Bash and run manually.

So I compared the git submodule update logic of the update_and_rebuild scripts for WASP, libMesh, and PETSc.

And the libMesh and PETSc versions both wrap that line with this guard so it is not run when not in a git repository.

git_dir=`git rev-parse --show-cdup 2>/dev/null`
if [[ -z "$go_fast" && $? == 0 && "x$git_dir" == "x" ]]; then
  git submodule update --init --recursive app_name
  # . . .
fi

The $? == 0 is the important bit here to prevent git submodule update from running if the previous git check fails.

However the update_and_rebuild_wasp.sh script does not have that check wrapping its git submodule update line.

The update_and_rebuild_wasp.sh script actually used to have this same check which guarded against that condition:

git rev-parse 2> /dev/null
if [[ $? -eq 0 ]] && [ -z "$SKIP" ]; then
  git submodule update --init --recursive "${WASP_SRC_DIR}"
  # . . .
fi

But this conditional check was removed in your https://github.com/idaholab/moose/commit/96c0e1be1082ee3774d66c959a64e02e535ac46d commit which was merged less than two weeks ago.

I'm not sure if removing this check was intentional or maybe just an oversight?

But if the ~2 week timeframe lines up with when @herter4171-kp saw it stop working, could that removal be why?

Is it possible that fixing may be as simple as just putting this check back in the update_and_rebuild_wasp.sh script.

git rev-parse 2> /dev/null
if [[ $? -eq 0 ]] ; then
  git submodule update --init --recursive "${WASP_SRC_DIR}"

I could very well be way off about all of this, so apologies if everything here is completely unrelated.