sandialabs / spack-manager

A project and machine deployment model using Spack
https://sandialabs.github.io/spack-manager/
Other
25 stars 17 forks source link

Question: how to set up environment to run batch script #289

Open jhux2 opened 2 years ago

jhux2 commented 2 years ago

Once I've built naluX on Summit, it's unclear to me how to make sure a batch job has the correct environment to run that executable.

Normally, I'd load the same modules as were used for the build.

I did try quick-activate $SPACK_MANAGER/environments/jhubuild and then submitting a job. The job failed with an error indicating it could not find CUDA.

jhux2 commented 2 years ago

@psakievich @tasmith4 @jrood-nrel

psakievich commented 2 years ago

@jhux2 I believe this should answer your question: https://psakievich.github.io/spack-manager/general/FAQ.html#how-do-i-use-the-executables-i-built-in-my-development-environment

jrood-nrel commented 2 years ago

This is what I do on Summit for the exawind-driver for example:

export CUDA_LAUNCH_BLOCKING=1
export SPACK_MANAGER=${PROJWORK}/cfd116/jrood/spack-manager-summit
source ${SPACK_MANAGER}/start.sh && spack-start
spack env activate -d ${SPACK_MANAGER}/environments/exawind-summit
spack load exawind
which exawind
psakievich commented 2 years ago

We should be getting CUDA_LAUNCH_BLOCKING in the environment when we do spack load exawind. Is that not the case @jrood-nrel ?

jhux2 commented 2 years ago

So if I understand, I should do

quick-activate $SPACK_MANAGER/environments/jhubuild

where jhubuild is the "environment" that I built naluX under.

But

spack load naluX

returns

==> Error: Spec 'naluX' matches no installed packages.

I feel that I'm missing something fundamental here.

[EDIT]

Btw, spack load exawind works, but the SHAs of exawind and naluX are different.

jrood-nrel commented 2 years ago

Ah yeah that should be the case @psakievich . Guess it's a habit.

https://github.com/psakievich/spack-manager/blob/023fd1469078d8cc9396e3ccd373826cbbd5522f/repos/exawind/packages/nalu-wind/package.py#L43-L48

jrood-nrel commented 2 years ago

spack load nalu-wind @jhux2

jhux2 commented 2 years ago

@jrood-nrel Thanks. I've launched a couple test jobs to see what effect spack load nalu-wind has.

jhux2 commented 2 years ago

My jobs failed with the same error as before:

762 FATAL ERROR: dlopen libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
763 FATAL ERROR: dlopen libcudart.so: libcudart.so: cannot open shared object file: No such file or directory
764 [h26n01:464770] Error: common_pami.c:1056 - ompi_common_pami_init() Unable to create PAMI client (rc=1)
765 [h26n01:464771] Error: common_pami.c:1056 - ompi_common_pami_init() Unable to create PAMI client (rc=1)

After issuing spack load nalu-wind, should there be any change in what modules are loaded? Or is that handled by spack setting all the right paths, etc.?

psakievich commented 2 years ago

@jhux2 spack should be handling all the right paths. so to confirm your script looks something like this?

# source $SPACK_MANAGER/start.sh has already occured in bashrc
quick-activate $SPACK_MANAGER/environments/jhubuild
spack load nalu-wind
srun [args] naluX -i [args] 
jhux2 commented 2 years ago

@psakievich Here's what I have in my batch script:

  export SPACK_MANAGER=~/exawind/sources/spack-manager
  source $SPACK_MANAGER/start.sh
  quick-activate $SPACK_MANAGER/environments/jhubuild
  spack load nalu-wind

  jsrun ....

This is a script that I've used for a long time. (I did move the naluX executable to another location, but I assume that should be safe to do.)

Where in the spack-manager tree can I find configure/build logs for Trilinos? I'd like to look over those logs to see if anything jumps out.

psakievich commented 2 years ago

spack cd -b trilinos will take you there and the spack- files will show you logs for everything that happened

psakievich commented 2 years ago

@jhux2 where are you at on this? do you still need help?

jhux2 commented 2 years ago

@psakievich Thanks for checking in. I haven't returned to this yet. The motivation was to see if building with spack-manager would help work around a Nalu-Wind runtime failure. It turns out there's a bug that affects both solver paths in the NGP code, so how nalu-wind gets built is moot.