Closed grondo closed 6 months ago
I've also added minimal support for extra spindle options in jobspec, mainly as a demonstration of how to do this in a Flux job shell plugin. The options supported presently are spindle.noclean
, spindle.nostrip
, spindle.follow-fork
and spindle.python-prefix=PATH
. These can be set with the flux mini
commands with the -o
option, e.g.
$ flux mini run -N4 -n16 -o spindle.noclean -o spindle.python-prefix=/path/to/my/python application ARGS...
It should be easy to support other options by expanding the sp_getopts()
function.
FYI - I just force pushed this branch to fix a couple issues:
-o spindle
option processingI've also tested this out on a TOSS 3 cluster. Here's instructions for testing this branch with the system installed Flux and TCE Spindle on TOSS 3:
./bootstrap.sh
./configure
with the same prefix as the tce version of Spindle:
$ ./configure --prefix=/usr/tce/packages/spindle/spindle
make -j
)plugin.load {file="/path/to/Spindle/src/flux/.libs/spindle.so"}
$ salloc -N2 -p pdebug srun -N2 --pty flux start
$ flux mini run -o userrc=sp.lua -o spindle -o verbose -N2 hostname
0.042s: flux-shell[0]: DEBUG: 0: task_count=2 slot_count=2 cores_per_slot=1 slots_per_node=1
0.042s: flux-shell[0]: DEBUG: 0: tasks [0] on cores 35
0.043s: flux-shell[0]: DEBUG: Loading /etc/flux/shell/initrc.lua
0.045s: flux-shell[0]: DEBUG: data-staging: Jobspec does not contain data-staging attributes. No staging necessary.
0.056s: flux-shell[0]: DEBUG: output: batch timeout = 0.500s
0.058s: flux-shell[0]: DEBUG: spindle: initializing spindle for use with flux
0.052s: flux-shell[1]: DEBUG: 1: tasks [1] on cores 35
0.053s: flux-shell[1]: DEBUG: Loading /etc/flux/shell/initrc.lua
0.055s: flux-shell[1]: DEBUG: data-staging: Jobspec does not contain data-staging attributes. No staging necessary.
0.069s: flux-shell[1]: DEBUG: spindle: initializing spindle for use with flux
0.077s: flux-shell[0]: DEBUG: spindle: started spindle backend pid = 30390
0.085s: flux-shell[0]: DEBUG: spindle: started spindle frontend
0.082s: flux-shell[1]: DEBUG: spindle: started spindle backend pid = 12001
0.192s: flux-shell[0]: DEBUG: task 0 complete status=0
0.194s: flux-shell[1]: DEBUG: task 1 complete status=0
0.207s: flux-shell[1]: DEBUG: exit 0
quartz1
quartz2
0.211s: flux-shell[0]: DEBUG: exit 0
@mplegendre Let me know if there is anything else needed here
This PR adds a "job shell plugin" for Flux which allows users to request Spindle be setup along with their job. For example, if this shell plugin is installed in the default job shell pluginpath for Flux, then a user can request spindle with their Flux job via
~Currently, no other spindle options are supported, but this could easily be added in the future by making the
spindle
shell option an arbitrary JSON object in jobspec, e.g.-o spindle.foo=bar
. This can be added if someone suggests some options that are normally set on the command line for Spindle assisted jobs.~ See below, some spindle options now supported.This plugin can be tested standalone on systems where a current version of Flux (flux-core >= ~-0.42.0) and Spindle are installed. Once the plugin is built, create a Lua shell config one liner:
Then run a test job loading this rc file, and activate the
spindle
plugin:for example.
There is one issue I've found: When running multiple short-lived tasks per node, there is occasionally a hang at startup. This may be because all the tasks complete on the "rank 0" shell before the other rank has started all the tasks. If I replace
hostname
withsleep 1
then the hang goes away.I ran
./bootstrap.sh
on this branch and checked in the results, since the byproducts seem to be part of the repo in this project. However, I had to use a newer version of autotools (from TOSS 4). The older version (on TOSS 3) didn't work for some reason. Therefore, there's a lot of churn in the last commit here.This should be considered a first cut of the job shell plugin for Flux. It has minimal testing by me, a person only marginally familiar with Spindle.