radical-cybertools / radical.repex.at

This is the github location for RepEx developed by the RADICAL team in conjunction with the York Lab.
Other
4 stars 3 forks source link

TUU usecase 8 replicas run fails on Blue Waters (v0.36.RC1@detached-v0.36.RC1) #42

Closed antonst closed 8 years ago

antonst commented 8 years ago

with:

Error running agent: LRMS type 'TORQUE' unknown or defunct
  File "/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/bin/radical-pilot-agent-multicore.py", line 6024, in bootstrap_3
    logger = log)
  File "/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/bin/radical-pilot-agent-multicore.py", line 2339, in create
    raise RuntimeError("LRMS type '%s' unknown or defunct" % name)
marksantcroos commented 8 years ago

Can you either dig out the actual error or provide the whole sandbox?

antonst commented 8 years ago

terminal output sandbox

marksantcroos commented 8 years ago

The error was just above it:

2015-10-07 01:23:01,103: agent_0.bootstrap_3 : MainProcess                     : MainThread     : ERROR   : lrms construction error
Traceback (most recent call last):
  File "/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/bin/radical-pilot-agent-multicore.py", line 2335, in create
    return impl(cfg, logger)
  File "/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/bin/radical-pilot-agent-multicore.py", line 2461, in __init__
    LRMS.__init__(self, cfg, logger)
  File "/mnt/c/scratch/sciteam/treikali/radical.pilot.sandbox/ve_bw/rp_install/bin/radical-pilot-agent-multicore.py", line 2237, in __init__
    layout = self._cfg['agent_layout']
KeyError: 'agent_layout'
marksantcroos commented 8 years ago

Are you using a custom resource and/or agent configuration?

The agent configuration you end up with seems to be something you can't end up with if you would use the defaults.

antonst commented 8 years ago

Are you using a custom resource and/or agent configuration?

I am using custom pilot_description._config, could this cause problems?

marksantcroos commented 8 years ago

I am using custom pilot_description._config, could this cause problems?

Yes, the default for bw is (specified as agent_config in the resource_config): https://raw.githubusercontent.com/radical-cybertools/radical.pilot/devel/src/radical/pilot/configs/agent_cray.json

If you want to diverge from that, it should follow that format.