radical-cybertools / radical.analytics

Analytics for RADICAL-Cybertools
Other
1 stars 1 forks source link

radical-analytics-inspect warnings and NoneType error #117

Open mturilli opened 4 years ago

mturilli commented 4 years ago
$ radical-stack 

  python               : 3.6.9
  pythonpath           : 
  virtualenv           : /home/mturilli/Virtualenvs/rp-paper-frontera

  radical.analytics    : 0.90.7-v0.72.0-38-g14b9581@devel
  radical.pilot        : 1.1.1-v1.1.1-9-g353c5876e@devel
  radical.saga         : 1.1.0-v1.1-10-g4cfdc77f@devel
  radical.utils        : 1.1.1-v1.1.1-14-g77ca0db@devel

Warnings and error:

$ ~/github/radical.analytics/bin/radical-analytics-inspect `pwd`/rp.session.login
2.frontera.tacc.utexas.edu.mturilli.018316.0002
rp.session.login2.frontera.tacc.utexas.edu.mturilli.018316.0002 cache read failed: Ran out of input
WARNING: profile "/home/mturilli/github/experiments/rp.paper/rawdata/spatial_heterogeneity/rp.session.login2.frontera.tacc.utexas.edu.mturilli.018316.0002/umgr_unschedule_pubsub.prof" not correctly closed.
WARNING: profile "/home/mturilli/github/experiments/rp.paper/rawdata/spatial_heterogeneity/rp.session.login2.frontera.tacc.utexas.edu.mturilli.018316.0002/umgr_scheduling_queue.prof" not correctly closed.
WARNING: profile "/home/mturilli/github/experiments/rp.paper/rawdata/spatial_heterogeneity/rp.session.login2.frontera.tacc.utexas.edu.mturilli.018316.0002/cmgr.0000.hb.prof" not correctly closed.
WARNING: profile "/home/mturilli/github/experiments/rp.paper/rawdata/spatial_heterogeneity/rp.session.login2.frontera.tacc.utexas.edu.mturilli.018316.0002/log_pubsub.prof" not correctly closed.
session loaded
Traceback (most recent call last):
  File "/home/mturilli/github/radical.analytics/bin/rp_inspect/plot_state.py", line 100, in <module>
    key=lambda v: v[1][index])]
TypeError: '<' not supported between instances of 'NoneType' and 'float'
...Traceback (most recent call last):
  File "/home/mturilli/github/radical.analytics/bin/rp_inspect/plot_util.py", line 116, in <module>
    prov, cons, stats_abs, stats_rel, info = session.utilization(metrics)
  File "/home/mturilli/Virtualenvs/rp-paper-frontera/lib/python3.6/site-packages/radical/analytics/session.py", line 975, in utilization
    provided  = rp.utils.get_provided_resources(self)
  File "/home/mturilli/Virtualenvs/rp-paper-frontera/lib/python3.6/site-packages/radical/pilot/utils/prof_utils.py", line 479, in get_provided_resources
    data = _get_pilot_provision(session, p)
  File "/home/mturilli/Virtualenvs/rp-paper-frontera/lib/python3.6/site-packages/radical/pilot/utils/prof_utils.py", line 437, in _get_pilot_provision
    cpn   = pilot.cfg['resource_details']['rm_info']['cores_per_node']
TypeError: 'NoneType' object is not subscriptable
 done
mturilli commented 4 years ago

@andre-merzky ping

lee212 commented 4 years ago

Same experience with recent session data:

$ bin/rp_inspect/plot_util.py re.session.login2.iyakushin.018593.0000
Traceback (most recent call last):
  File "bin/rp_inspect/plot_util.py", line 118, in <module>
    prov, cons, stats_abs, stats_rel, info = session.utilization(metrics)
  File "/ccs/home/hrlee/.conda/envs/ipynb/lib/python3.7/site-packages/radical/analytics/session.py", line 990, in utilization
    provided  = rp.utils.get_provided_resources(self)
  File "/ccs/home/hrlee/.conda/envs/ipynb/lib/python3.7/site-packages/radical/pilot/utils/prof_utils.py", line 856, in get_provided_resources
    data = _get_pilot_provision(p)
  File "/ccs/home/hrlee/.conda/envs/ipynb/lib/python3.7/site-packages/radical/pilot/utils/prof_utils.py", line 814, in _get_pilot_provision
    cpn   = pilot.cfg['resource_details']['rm_info']['cores_per_node']
TypeError: 'NoneType' object is not subscriptable

The session is here: https://github.com/radical-experiments/deepdriveMD/tree/master/data/async

lee212 commented 3 years ago

another session data added and same error message: re.session.login2.iyakushin.018598.0002.tar.gz

andre-merzky commented 3 years ago

What is the radical stack you are using by now?

andre-merzky commented 3 years ago

Never mind, found that:

 radical.pilot version: 1.5.4
 radical.saga  version: 1.5.6
 radical.utils version: 1.5.4

I am quite surprised that this still is an issue - the stack is up to date, and I can't really see how the resource details go missing (log shows they are written to the DB all right).

I prepared an RP branch fix/issue_ra_117 to dig into this a bit deeper -- that branch removes the client side setting for resource_details completely, so we should be able to distinguish if the client or pilot side is at fault. Can you please give this a try and see how that goes? Thanks!

lee212 commented 3 years ago

@andre-merzky , I tried the branch, and I still see the same error, does this mean that the fault is at pilot side?

andre-merzky commented 3 years ago

Let me try to reproduce this, please. Can you provide a small example script, ideally with a fake workload (as presumably the workload should not matter) which I can run? Do you see the same problem on other resources too (with that script)?