Closed mtitov closed 1 year ago
@mtitov I installed a new python environment with the following radical-stack:
python : /CSC_CONTAINER/miniconda/envs/env1/bin/python3
pythonpath :
version : 3.11.4
virtualenv : env1
radical.gtod : 1.20.1
radical.pilot : 1.37.0-v1.36.0-621-gfdf0e8e9e@feature-service_startup_flag
radical.saga : 1.36.0
radical.utils : 1.40.0-v1.33.0-32-g9eb1b32@devel_nodb_2
Then I updated the python script I think as suggested. However, I'm getting an error before it hits any of the new service metadata code. I'm wondering if the new install isn't working, or if I missed something I need to change in the script:
Traceback (most recent call last):
File "/projappl/project_2006549/radical-pilot/rp_soma.py", line 9, in <module>
session = rp.Session()
^^^^^^^^^^^^
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/site-packages/radical/pilot/session.py", line 195, in __init__
self._init_primary()
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/site-packages/radical/pilot/session.py", line 249, in _init_primary
self._init_cfg_from_scratch()
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/site-packages/radical/pilot/session.py", line 395, in _init_cfg_from_scratch
while isinstance(rcfg['schemas'][schema], str):
~~~~~~~~~~~~~~~^^^^^^^^
TypeError: list indices must be integers or slices, not str
Or is it the resource config? I still have the local one for mahti that we updated - I don't think I am using the one that was checked in...unless that happens automatically now with these new branches, but they should amount to the same config either way.
Or is it the resource config? I still have the local one for mahti that we updated - I don't think I am using the one that was checked in...unless that happens automatically now with these new branches, but they should amount to the same config either way.
oh, right, that one is read automatically, you can delete your local one (with the new no-mongodb branch, resource config structure got a little changed, and I've adjusted one that you've merged, thus this branch already includes csc.mahti
with the correct structure)
@mtitov Here is the new error we are getting and the task description information:
================================================================================
Getting Started (RP version 1.37.0)
================================================================================
new session: [rp.session.c1102.mahti.csc.fi.dewiy.019608.0003] \
zmq proxy : [tcp://10.141.32.52:10001] ok
create pilot manager ok
--------------------------------------------------------------------------------
submit pilot
submit 1 pilot(s)
pilot.0000 csc.mahti 1 cores 0 gpus ok
create task managerTraceback (most recent call last):
File "/projappl/project_2006549/radical-pilot/rp_soma.py", line 56, in <module>
tmgr = rp.TaskManager(session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/site-packages/radical/pilot/task_manager.py", line 160, in __init__
self._cmgr.start_components(self._cfg.components)
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/site-packages/radical/pilot/utils/component_manager.py", line 191, in start_components
out, err, ret = ru.sh_callout(cmd, cwd=self._cfg.path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/site-packages/radical/utils/shell.py", line 64, in sh_callout
stdout, stderr = p.communicate()
^^^^^^^^^^^^^^^
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/subprocess.py", line 1209, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/subprocess.py", line 2108, in _communicate
ready = selector.select(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/CSC_CONTAINER/miniconda/envs/env1/lib/python3.11/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
session = rp.Session()
pmgr = rp.PilotManager(session=session)
td = rp.TaskDescription()
td.pre_exec = ['module load gcc/9.4.0 openmpi/4.1.2-cuda cuda cmake', 'export SOMA_SERVER_ADDR_FILE=/projappl/project_2006549/radical-pilot/server.add', 'export SOMA_NODE_ADDR_FILE=/projappl/project_2006549/radical-pilot/node.add', 'export SOMA_NUM_SERVER_INSTANCES=1', 'export SOMA_NUM_SERVERS_PER_INSTANCE=1', 'export SOMA_SERVER_INSTANCE_ID=0' ]
td.executable = '/projappl/project_2006549/soma-collector/build/examples/example-server'
td.arguments = ['-a', 'ofi+verbs://']
td.ranks = 1
td.cores_per_rank = 1
td.metadata = {
'name': 'soma_00', # in RP Registry its URL will be accessed with path:
# "service.soma_00.<idx>.url"
# where <idx> refers to the instance id
'startup_file': '/projappl/project_2006549/radical-pilot/server.add' # for now try with the full path outside of the pilot sandbox
}
pd_init = {'resource' : 'csc.mahti',
'runtime' : 5, # pilot runtime minutes
'exit_on_error': True,
'project' : 'project_2006549',
'queue' : 'test',
'cores' : 1,
'gpus' : 0,
'access_schema': 'interactive',
'services' : [td]}
pdesc = rp.PilotDescription(pd_init)
pdesc.services = [td]
Hi, I have been able to launch SOMA as a service on one node and LULESH as a task on another - and they have been able to connect and send/receive data. Next I will be trying with the tau performance data layer and with different resource allocations (multiple nodes/tasks etc).
Merging #3026 (57bfad7) into devel (7b1d9d9) will increase coverage by
0.34%
. The diff coverage is50.00%
.
@@ Coverage Diff @@
## devel #3026 +/- ##
==========================================
+ Coverage 42.41% 42.76% +0.34%
==========================================
Files 99 99
Lines 10824 10858 +34
==========================================
+ Hits 4591 4643 +52
+ Misses 6233 6215 -18
Files | Coverage Δ | |
---|---|---|
src/radical/pilot/agent/agent_0.py | 40.54% <50.00%> (-1.97%) |
:arrow_down: |
... and 2 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@dyokelson hi Dewi, can you please try this branch to test services startup
(1) Installation of the required branches within your virtual environment (uninstall previous radical tools first)
(2) Service task description - when you've described service task, set the
metadata
attribute as following