hpcugent / Lmod-UGent

spec files of Lmod for UGent-HPC
8 stars 7 forks source link

add patch to ensure Lmod cache is used when loading cluster modules #27

Closed boegel closed 7 years ago

boegel commented 7 years ago

PR https://github.com/TACC/Lmod/pull/300 was merged, but no new version was tagged yet. I see no reason to wait for that though...

Our integration tests pass with this installed:

[08:29:17] vsc40023@node2003:~/vsc-testing/module $ ./run_all_tests.sh

Lmod-7.5.10-7.ug.el7.centos.noarch
vsc-cluster-modules-0.21-1.noarch
vsc-cluster-modules-tier2-0.21-1.noarch

> module --version

Modules based on Lua: Version 7.5.10  2017-07-05 14:04 -05:00
$MODULEPATH: /apps/gent/CO7/sandybridge/modules/all:/apps/gent/SL6/sandybridge/modules/all:/etc/modulefiles/vsc

*** 001_list.sh ***

> module list

>>> 001_list.sh: PASS

*** 002_avail.sh ***

> module avail
> module avail GCC
> module avail GCC/4.9.3

>>> 002_avail.sh: PASS

*** 003_load.sh ***

> module load GCC
> module load GCC/4.9.3
> module load intel
> module load foss
> module load Python/2.7.11-intel-2016a
> module load GCC/4.9.3-2.25 OpenMPI/1.10.2-GCC-4.9.3-2.25 OpenBLAS/0.2.15-GCC-4.9.3-2.25-LAPACK-3.6.0 FFTW/3.3.4-gompi-2016a

>>> 003_load.sh: PASS

*** 004_purge.sh ***

> module load Python/2.7.11-intel-2016a
> module purge
> module load cluster
> module purge -force
> module load cluster
> module purge -force
> module load cluster/delcatty

>>> 004_purge.sh: PASS

*** 005_swap.sh ***

> module load GCC/4.9.3
> module swap GCC/5.3.0
> module swap GCC GCC/4.9.3
> module swap GCC/4.9.3 GCC/5.3.0

>>> 005_swap.sh: PASS

*** 006_unload.sh ***

> module load GCC/4.9.3
> module unload GCC
> module load GCC/4.9.3
> module unload GCC/4.9.3

>>> 006_unload.sh: PASS

*** 007_spider.sh ***

> module spider intel
> module spider intel/2016a
> module --show-hidden spider intel/2016a

>>> 007_spider.sh: PASS

*** 010_stdout_stderr.sh ***

> module list
> module avail

>>> 010_stdout_stderr.sh: PASS

*** 050_ml.sh ***

> ml av GCC/4.9.3
> ml
> ml GCC/4.9.3
> ml
> ml -GCC/4.9.3
> ml

>>> 050_ml.sh: PASS

*** 051_collections.sh ***

> ml foss/2016a
> ml Python/2.7.11-intel-2016a
> ml save this_is_just_a_test_collection_for_module_integration_test_051
> ml describe this_is_just_a_test_collection_for_module_integration_test_051
> ml purge
> ml restore this_is_just_a_test_collection_for_module_integration_test_051
> ml purge
> module swap cluster/delcatty cluster/golett
> ml purge
> ml restore this_is_just_a_test_collection_for_module_integration_test_051
> ml purge

>>> 051_collections.sh: PASS

*** 100_lmod_cache.sh ***

> module avail  # 2s time limit

>>> 100_lmod_cache.sh: PASS

*** 101_LD_LIBRARY_PATH.sh ***

> module load GCC/4.9.3-2.25
> module load OpenMPI/1.10.2-GCC-4.9.3-2.25
> checking $LD_LIBRARY_PATH...

>>> 101_LD_LIBRARY_PATH.sh: PASS

*** 102_symlink_modulepath.sh ***

> module use /tmp/vsc40023/afYWQX/symlinked_modules
> module avail test/1.2.3

>>> 102_symlink_modulepath.sh: PASS

*** 103_tcl2lua_LD_PRELOAD.sh ***

> module load jemalloc
> module show jemalloc
> ml intel/2016a && LD_LIBRARY_PATH='' LD_PRELOAD='' ml MariaDB/10.1.14-intel-2016a

>>> 103_tcl2lua_LD_PRELOAD.sh: PASS

TEST RESULT: all 14 passed!
stdweird commented 7 years ago

i also tested the patch on node2160, no apps access anymore, but /etc/modules/vsc is still scanned (but that is less of an issue). iguess /etc/modules/vsc is not in the cache?

boegel commented 7 years ago

@stdweird If you do a module load cluster, it will still re-evaluate all cluster modules to check which one is the default, I think (it doesn't decide the default based on the cache).

I may be wrong here though... Maybe @rtmclay can clarify?

Anyway, since that bit is local disk, that's really not a problem imho.

stdweird commented 7 years ago

@boegel how are "cluster" modules different from other modules? anyway, i just saw this in the strace, and wanted to report it.

boegel commented 7 years ago

We deploy those via an RPM (see our vsc-cluster-modules repo)

stdweird commented 7 years ago

@boegel and? you mean there's no update cache in the rpm post script? but the cron job should pick this up, no?

boegel commented 7 years ago

@stdweird The cron job starts from /etc/modulefiles/vsc, and then discovers all other modules through the paths that the cluster modules prepend to $MODULEPATH.

The cluster modules themselves are also in the Lmod cache though (see grep modulefiles /apps/gent/lmodcache/spiderT.lua), it's scanning them for another reason (my best guess is to check which of them is the default).

stdweird commented 7 years ago

@boegel ok, thanks.