Open mturilli opened 4 years ago
@andre-merzky ping
Same experience with recent session data:
$ bin/rp_inspect/plot_util.py re.session.login2.iyakushin.018593.0000
Traceback (most recent call last):
File "bin/rp_inspect/plot_util.py", line 118, in <module>
prov, cons, stats_abs, stats_rel, info = session.utilization(metrics)
File "/ccs/home/hrlee/.conda/envs/ipynb/lib/python3.7/site-packages/radical/analytics/session.py", line 990, in utilization
provided = rp.utils.get_provided_resources(self)
File "/ccs/home/hrlee/.conda/envs/ipynb/lib/python3.7/site-packages/radical/pilot/utils/prof_utils.py", line 856, in get_provided_resources
data = _get_pilot_provision(p)
File "/ccs/home/hrlee/.conda/envs/ipynb/lib/python3.7/site-packages/radical/pilot/utils/prof_utils.py", line 814, in _get_pilot_provision
cpn = pilot.cfg['resource_details']['rm_info']['cores_per_node']
TypeError: 'NoneType' object is not subscriptable
The session is here: https://github.com/radical-experiments/deepdriveMD/tree/master/data/async
another session data added and same error message: re.session.login2.iyakushin.018598.0002.tar.gz
What is the radical stack you are using by now?
Never mind, found that:
radical.pilot version: 1.5.4
radical.saga version: 1.5.6
radical.utils version: 1.5.4
I am quite surprised that this still is an issue - the stack is up to date, and I can't really see how the resource details go missing (log shows they are written to the DB all right).
I prepared an RP branch fix/issue_ra_117
to dig into this a bit deeper -- that branch removes the client side setting for resource_details
completely, so we should be able to distinguish if the client or pilot side is at fault. Can you please give this a try and see how that goes? Thanks!
@andre-merzky , I tried the branch, and I still see the same error, does this mean that the fault is at pilot side?
Let me try to reproduce this, please. Can you provide a small example script, ideally with a fake workload (as presumably the workload should not matter) which I can run? Do you see the same problem on other resources too (with that script)?
Warnings and error: