bsub option for NVME in the LSF scheduler

radical-cybertools / radical.saga

A Light-Weight Access Layer for Distributed Computing Infrastructure and Reference Implementation of the SAGA Python Language Bindings.

http://radical-cybertools.github.io/saga-python/

Other

83 stars 34 forks source link

bsub option for NVME in the LSF scheduler #807

Closed lee212 closed 4 years ago

lee212 commented 4 years ago

On Summit, Burst Buffer can be activated via the job scheduler option like:

-allow_flags NVME

There are three options that I can think of now:

make this option as a default on Summit, then all reserved/allocated nodes have NVMe storage devices mounted under /mnt/bb/$USER
create a special branch to use NVMe
manual modification directly in the saga code (similarily to SMT level adjustment i.e. #707)

What would be a reasonable choice or is there a better option?

mturilli commented 4 years ago

If it is needed now, I would enable it by default. I would also open a ticket for implementing it as a configurable parameter, possibly using the upcoming architecture configuration parameter?

andre-merzky commented 4 years ago

There are different options on how to express this in the SystemArchitecture attribute. For example:

SystemArchitecture = {'smt': 4, 'nvme': True, ...}

SystemArchitecture = {'smt': 4, 'flags': ['nvme'], ...}

I would slightly prefer the latter to more easily support other machine-specific flags should the need arise. @mtitov : any opinion?

mtitov commented 4 years ago

@andre-merzky I also would prefer the 2nd option, but give the name as flags or alloc_flags (the option name is -alloc_flags)?

And as follow up question - should we have an env variable for these flags? (e.g., RADICAL_SAGA_[ALLOC_]FLAGS="gpumps nvme")

p.s. there is another flag (gpumps) that is the case for Summit, but not for Lassen, so we could move it to the config as well following these updates?

andre-merzky commented 4 years ago

--alloc_flags might be right for LSF, but similar options arer likely named differently for other batch systems. So I would opt for the more generic flags, or alternatively options? I know that involves code on the adptor level to plug things apart again - for example 'options' : ['nvme', 'gpumps'] would need to be mapped to different batch parameters. But that is really what SAGA is about, to abstract details of the batch system...

andre-merzky commented 4 years ago

And as follow up question - should we have an env variable for these flags? (e.g., RADICALSAGA[ALLOC_]FLAGS="gpumps nvme")

Well, it is trivial to expose an env variable via the default config files:

>>> import radical.utils as ru
>>> import os

>>> cfg = ru.Config(cfg={'my_foo': 'my_${FOO:biz}_env'})
>>> cfg.as_dict()
{'my_foo': 'my_biz_env'}

>>> os.environ['FOO'] = 'bar'

>>> cfg = ru.Config(cfg={'my_foo': 'my_${FOO:biz}_env'})
>>> cfg.as_dict()
{'my_foo': 'my_bar_env'}

but I would like not to do that: at the moment we have 1 use case, and it is not clear at all that this needs switching from run to run. I really do like env variables (ask Matteo :-) ) - but not everything should be exposed by env vars by default, that becomes unwieldy very quickly.

mtitov commented 4 years ago

@andre-merzky ok! (and good point with {config + env var})

lee212 commented 4 years ago

I also support the 2nd option with flags key name because there will be -alloc_flags spectral as well on Summit for using nvme with a library.

mtitov commented 4 years ago

As a summary: Summit resource config was updated to use the new approach of setting alloc_flags and the default value is

        "system_architecture"         : {"smt": 4,
                                         "options": ["gpumps"]}

thus to use flag nvme it could be updated as following:

        "system_architecture"         : {"smt": 4,
                                         "options": ["gpumps", "nvme"]}

p.s. all valid flags for Summit are: gpumps, gpudefault, nvme, spectral, maximizegpfs