radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Start RM before starting Session components (`agent_0`) #3079

Closed mtitov closed 11 months ago

mtitov commented 11 months ago

With the current setup, RM initialization happens in components first, before it happens in agent_0, but should be in opposite

(ve.rp) [matitov@login12.frontier pilot.0000]$ grep -r "RM init from" ./*
./agent_0.log:1698866658.513 : agent_0              : 25861 : 140737179934528 : DEBUG    : RM init from registry
./agent_executing.0000.log:1698866634.720 : agent_executing.0000 : 26039 : 140735520786176 : DEBUG    : RM init from scratch
./agent_scheduling.0000.log:1698866634.720 : agent_scheduling.0000 : 26054 : 140735520786176 : DEBUG    : RM init from scratch

Because of that I had an issue with LM initialization, which requires self._configure to run during its self._init_from_scratch method

And this is an example that LM's self._configure was called in Scheduler first

grep "DVM ready" ../*
../agent_scheduling.0000.log:1698866639.153 : agent_scheduling.0000 : 26313 : 140732819072768 : DEBUG    : prte-0 output: b'DVM ready'

Thus, with this PR I tried to reorder RM and Session initializations, but looks like RM suppose to be in between the following Session's calls... https://github.com/radical-cybertools/radical.pilot/blob/e2a38170ab377d707c7242bd85ba772b3a7d4975/src/radical/pilot/session.py#L281-L286 ... after starting registry, but before starting components

@andre-merzky Andre, what should we do here?

codecov[bot] commented 11 months ago

Codecov Report

Merging #3079 (57bf902) into devel (4d3ee16) will increase coverage by 0.00%. The diff coverage is 50.00%.

@@           Coverage Diff           @@
##            devel    #3079   +/-   ##
=======================================
  Coverage   43.99%   43.99%           
=======================================
  Files          96       96           
  Lines       10570    10565    -5     
=======================================
- Hits         4650     4648    -2     
+ Misses       5920     5917    -3     
Files Coverage Δ
src/radical/pilot/agent/agent_0.py 40.74% <0.00%> (+0.19%) :arrow_up:
src/radical/pilot/utils/component.py 34.15% <0.00%> (ø)
src/radical/pilot/session.py 50.26% <63.63%> (-0.09%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

andre-merzky commented 11 months ago

I would suggest to change, in Session._init_agent_0, from

        self._publish_cfg()
        self._start_components()

to

        self._publish_cfg()
        self._init_rm()
        self._start_components()

The session has all information required to initialize the RM on the agent's behalf. That way we could then add a Session.get_rm() method so that the agent obtains the RM handle.

IMHO that would be the smallest code change and we would not need to fiddle with logger initialization (the session logger would be used).

What do you think?

mtitov commented 11 months ago

Doesn't look right to have RM within Session, I think we can move starting components outside of the Session initialization

self._session = Session(...
self._rm = ResourceManager.create(...
<start_components>

for now we can keep start_components as part of the Session class, but in general I wouldn't keep it there

mtitov commented 11 months ago

Closed in favor of PR #3081