Closed andre-merzky closed 6 months ago
Attention: Patch coverage is 37.35409%
with 161 lines
in your changes are missing coverage. Please review.
Project coverage is 45.02%. Comparing base (
74a8188
) to head (bd13797
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@andre-merzky RS Session's "context" was mentioned in one of the tutorials, which also could be removed
# notebook docs/source/tutorials/submission.ipynb
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
import radical.pilot as rp
session = rp.Session()
context = rp.Context('ssh')
context.user_id = "user_id"
session.add_context(context)
------------------
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_3499/127513295.py in <module>
6 context.user_id = "user_id"
7
----> 8 session.add_context(context)
AttributeError: 'Session' object has no attribute 'add_context'
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
thanks @mtitov, done as suggested.
@andre-merzky:
Traceback (most recent call last):
File \"/home/aymen/ve/test_rpex_final/lib/python3.8/site-packages/radical/pilot/tmgr/staging_input/default.py\",
line 285, in work\n self._handle_task(task, actionables)\n
File "/home/aymen/ve/test_rpex_final/lib/python3.8/site-packages/radical/pilot/tmgr/staging_input/default.py\", line 334,
in _handle_task\n self._fs_cache[key] = rsfs.Directory(tmp)\nNameError: name 'rsfs' is not defined",
maybe:
import radical.saga.filesystem as rsfs
Thanks for spotting that, @AymenFJA - not sure how that crept in again. Either way, removed again now :-)
maybe:
import radical.saga.filesystem as rsfs
Rather not - the idea was to remove saga ;-)
Given the urgency for this PR, I would suggest to review it now. The failing tests are related but not critical: pilot log- and profiles are not staged back just yet. I would suggest to address that in a separate ticket.
@andre-merzky I am trying to run the following example: ~/RADICAL/radical.pilot/examples$ python 05_task_input_data.py
and I am getting the following errors:
aymen@surfacebook:~/radical.pilot.sandbox/rp.session.surfacebook.aymen.019795.0001/pilot.0000/task.000028$ cat *.err
/usr/bin/wc: input.dat: No such file or directory
submit tasks
create 128 task description(s)
........................................................................
........................................................ ok
--------------------------------------------------------------------------------
gather results
* task.000000: FAIL, exit: 1, out:
* task.000001: FAIL, exit: 1, out:
* task.000002: FAIL, exit: 1, out:
* task.000003: FAIL, exit: 1, out:
* task.000004: FAIL, exit: 1, out:
* task.000005: FAIL, exit: 1, out:
* task.000006: FAIL, exit: 1, out:
* task.000007: FAIL, exit: 1, out:
* task.000008: FAIL, exit: 1, out:
* task.000009: FAIL, exit: 1, out:
* task.000010: FAIL, exit: 1, out:
* task.000011: FAIL, exit: 1, out:
* task.000012: FAIL, exit: 1, out:
* task.000013: FAIL, exit: 1, out:
* task.000014: FAIL, exit: 1, out:
* task.000015: FAIL, exit: 1, out:
* task.000016: FAIL, exit: 1, out:
* task.000017: FAIL, exit: 1, out:
* task.000018: FAIL, exit: 1, out:
* task.000019: FAIL, exit: 1, out:
* task.000020: FAIL, exit: 1, out:
* task.000021: FAIL, exit: 1, out:
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
tmgr.0000.json: "exception": "RuntimeError(\"task failed\")",
Ah, great catch - schema expansion seems to be off. Fix incoming. Thanks for your diligence and patience with this!
Thanks, @andre-merzky So far, this is the update:
Here several comments, and the rest are inline
* in `bin/radical-pilot-create-static-ve` need to edit default modules (L59-61) - at least one module (e.g., `apache-libcloud`) is SAGA's dependency and is not needed to be installed (not sure about `bootstrap_0.sh`)
Thanks, done (bootstrap_0.sh
was clean already)
* need to update `requirements-ci.txt` (remove `saga` and add `psij`)
Thanks, fixed.
* update PilotDescription docstring?
Done.
one leftover:
~/work/radical.pilot/radical.pilot/testenv/lib/python3.7/site-packages/radical/pilot/pilot_manager.py in cancel_pilots(self, uids, _timeout)
744 if uid not in self._pilots:
745 raise ValueError('pilot %s not known' % uid)
--> 746 self._pilots[uid]._finalize()
747
748
AttributeError: 'Pilot' object has no attribute '_finalize'
one leftover:
Oops - sorry for that - fixed!
(1)
@andre-merzky sorry, one more (call of the same method, but it appeared in two places) - another left in pilot's cancel
method
@andre-merzky sorry, one more (call of the same method, but it appeared in two places) - another left in pilot's
cancel
method
Thanks - fixed!
Thanks, @andre-merzky So far, this is the update:
- [x] Local machine test: passed
- [x] HPC test: passed
@andre-merzky the tests passed locally and on HPC interactivley (Bridges2
).
Thanks, @andre-merzky So far, this is the update:
- [x] Local machine test: passed
- [x] HPC test: passed
@andre-merzky the tests passed locally and on HPC interactivley (
Bridges2
).
Great, thanks @AymenFJA ! Let's wait for @mtitov to confirm on Frontier before merging...
@andre-merzky with these updates there is no collection of logs and profs files
session.close(download=True)
, but no files on the client side in <sessionID>/pilot.0000
Worked correctly for both runs from the interactive job and from the login node on Frontier
Though, to run from login node the following should be fixed:
datetime.timedelta(minutes=jd.wall_time_limit)
Also, while running from the login node, we didn't have possibility to control SMT and CoreSpec (as well options exclusive
and export
)
p.s. my test was with the single task
@andre-merzky with these updates there is no collection of logs and profs files
* in my example I've set `session.close(download=True)`, but no files on the client side in `<sessionID>/pilot.0000`
Thanks, this should now work again.
9083f19 breaks the test of the submission tutorial. We assume (maybe rightfully or wrongly) that each tutorial creates a session. That commit removes the creation of the session from the tutorial. In turn that breaks the github workflow. I am going to commit a fix and we can leave for another time discussing whether we should assume that each tutorial creates a session.
9083f19 breaks the test of the submission tutorial. We assume (maybe rightfully or wrongly) that each tutorial creates a session. That commit removes the creation of the session from the tutorial. In turn that breaks the github workflow. I am going to commit a fix and we can leave for another time discussing whether we should assume that each tutorial creates a session.
Right - the PR removes the Context
as that was part of SAGA's security model - which got removed along with SAGA. Incidentally that was the piece of code which created an (empty) session indeed...
remove
rs.Session
as base class forrp.Session
. This closes #2674