radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

session dump goes to wrong directory #2556

Closed andre-merzky closed 2 years ago

andre-merzky commented 2 years ago
[Aymen](https://app.slack.com/team/UASTM0TN2) Hello [@andre.merzky](https://radical-lab.slack.com/team/U2PLNRU79), I am facing this error when closing the session in RP-Parsl:
2022-03-22 23:10:26,592 - colmena.task_server.base - INFO - Kill signal received
2022-03-22 23:10:26,592 - parsl.dataflow.dflow - INFO - Waiting for all remaining tasks to complete
2022-03-22 23:10:26,592 - parsl.dataflow.dflow - INFO - All remaining tasks completed
2022-03-22 23:10:26,592 - parsl.dataflow.dflow - INFO - DFK cleanup initiated
2022-03-22 23:10:26,592 - parsl.dataflow.dflow - INFO - Summary of tasks in DFK:

2022-03-22 23:10:26,592 - parsl.dataflow.dflow - INFO - Tasks in state States.exec_done: 384
2022-03-22 23:10:26,592 - parsl.dataflow.dflow - INFO - End of summary
2022-03-22 23:10:26,593 - parsl.dataflow.dflow - INFO - Terminating flow_control and strategy threads

closing session parsl.radical.session.radical.3.aymen.019073.0014              \
close task manager                                                            ok
close pilot manager                                                            \
wait for 1 pilot(s)
              0                                                               ok
                                                                              ok
Process ParslTaskServer-1:
Traceback (most recent call last):
  File "/home/aymen/miniconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/colmena/task_server/parsl.py", line 362, in run
    super().run()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/colmena/task_server/base.py", line 91, in run
    self._cleanup()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/colmena/task_server/parsl.py", line 353, in _cleanup
    dfk.cleanup()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/parsl/dataflow/dflow.py", line 1096, in cleanup
    executor.shutdown()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/pilot/agent/executing/parsl_rp.py", line 360, in shutdown
    self.session.close(download=True)
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/pilot/session.py", line 300, in close
    self.fetch_json    (tgt='%s/%s' % (tgt, self.uid))
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/pilot/session.py", line 659, in fetch_json
    return rpu.fetch_json(self._uid, dburl=self.dburl, tgt=tgt,
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/pilot/utils/session.py", line 414, in fetch_json
    ru.write_json(json_docs, dst)
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/utils/json_io.py", line 62, in write_json
    with ru_open(fname, 'w') as f:
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/utils/misc.py", line 946, in ru_open
    return open(*args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/colmena_cfg1pu54/parsl.radical.session.radical.3.aymen.019073.0014/parsl.radical.session.radical.3.aymen.019073.0014.json'
[11:19 PM](https://radical-lab.slack.com/archives/DBA5RBT7E/p1647987566014989)

apparently RP is looking in the wrong dir, which in my understanding it's the task dir that Colmena sets. Any idea?

AymenFJA commented 2 years ago

@andre-merzky, thank you for the ticket. Following the last commit you provided, the issue still exists:

Process ParslTaskServer-1:
Traceback (most recent call last):
  File "/home/aymen/miniconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/colmena/task_server/parsl.py", line 369, in run
    super().run()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/colmena/task_server/base.py", line 91, in run
    self._cleanup()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/colmena/task_server/parsl.py", line 360, in _cleanup
    dfk.cleanup()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/parsl/dataflow/dflow.py", line 1096, in cleanup
    executor.shutdown()
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/pilot/agent/executing/parsl_rp.py", line 370, in shutdown
    self.session.close(download=True)
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/pilot/session.py", line 300, in close
    self.fetch_json (tgt=tgt)
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/pilot/session.py", line 659, in fetch_json
    return rpu.fetch_json(self._uid, dburl=self.dburl, tgt=tgt,
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/pilot/utils/session.py", line 414, in fetch_json
    ru.write_json(json_docs, dst)
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/utils/json_io.py", line 62, in write_json
    with ru_open(fname, 'w') as f:
  File "/home/aymen/ve/raptor/lib/python3.9/site-packages/radical/utils/misc.py", line 946, in ru_open
    return open(*args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/raptor_test/test_colmena_new_redis_exec/parsl.radical.session.radical.3.aymen.019074.0002/parsl.radical.session.radical.3.aymen.019074.0002/parsl.radical.session.radical.3.aymen.019074.0002.json'
andre-merzky commented 2 years ago

Ok, let me try to reproduce this w/o having to run the whole colmena setup...

AymenFJA commented 2 years ago

@andre-merzky any updates, please? Happy to provide you with steps/ready_env to reproduce it.

AymenFJA commented 2 years ago

This issue is fixed, and there the PR is merged.