radical-cybertools / radical.analytics

Analytics for RADICAL-Cybertools
Other
1 stars 1 forks source link

Analytics fails with function executor #93

Closed SrinivasMushnoori closed 4 years ago

SrinivasMushnoori commented 5 years ago

Here's the session I am working with. JSON file included.

Running any analytics script gives the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-2f2922ac19f8> in <module>()
     13 
     14 session = ra.Session(src   = 'rp.session.mcewan.engr.rutgers.edu.scm177.018051.0003',
---> 15                      stype = 'radical.pilot')
     16 
     17 units = session.filter(etype='unit', inplace=False)

/home/scm177/VirtualEnvs/rct_class/local/lib/python2.7/site-packages/radical/analytics/session.pyc in __init__(self, src, stype, sid, _entities, _init)
    115             self._profile, accuracy, hostmap \
    116                               = rpu.get_session_profile    (sid=sid, src=self._src)
--> 117             self._description = rpu.get_session_description(sid=sid, src=self._src)
    118 
    119             self._description['accuracy'] = accuracy

/home/scm177/VirtualEnvs/rct_class/local/lib/python2.7/site-packages/radical/pilot/utils/prof_utils.pyc in get_session_description(sid, src, dburl)
    197         umgr = unit['pilot']
    198         tree[pid ]['children'].append(uid)
--> 199         tree[umgr]['children'].append(uid)
    200         tree[uid] = {'uid'         : uid,
    201                      'etype'       : 'unit',

KeyError: u'agent_0.executing.0.child.func_exec.0000.000'

Note that I am running this in a Jupyter Notebook. Here's a super stripped down analytics script to reproduce the error:

import os
import radical.utils as ru
import radical.pilot as rp
import radical.analytics as ra

session = ra.Session(src   = 'rp.session.mcewan.engr.rutgers.edu.scm177.018051.0003',
                                  stype = 'radical.pilot')
units = session.filter(etype='unit', inplace=False)
duration_active = units.duration([rp.AGENT_EXECUTING, rp.FINAL])
print duration_active

Happy to help debug.

EDIT: radical-stack is

  python               : 2.7.14
  pythonpath           : 
  virtualenv           : /home/scm177/VirtualEnvs/rct_class

  radical.analytics    : v0.60.0-2-ge579b1d@devel
  radical.pilot        : 0.61.0-v0.61.0-42-g5001fad@rct-comm
  radical.saga         : 0.60.0-v0.60.0-2-gcbfe6df@hotfix-delay_expand
  radical.utils        : 0.60.2-v0.60.2-1-g1537e00@master
mturilli commented 5 years ago

I successuffly reproduced the error with the full master and devel stack:

  python               : 2.7.16
  pythonpath           : 
  virtualenv           : /Users/mturilli/Virtualenvs/ra_fix_93

  radical.analytics    : v0.60.0-4-g93c19e7@devel
  radical.pilot        : 0.61.0-v0.61.0-67-g04c87616@devel
  radical.saga         : 0.60.0-v0.60.0-13-g934bbd66@devel
  radical.utils        : 0.60.2-v0.60.2-1-g1537e00@devel
----
  python               : 2.7.16
  pythonpath           : 
  virtualenv           : /Users/mturilli/Virtualenvs/ra_fix_93

  radical.analytics    : v0.60.1-1-gad022e6@master
  radical.pilot        : 0.61.0-v0.61.0@master
  radical.saga         : 0.60.0-v0.60.0@master
  radical.utils        : 0.60.2-v0.60.2-1-g1537e00@master

I will work at this with the full devel stack. Would this work for you or do you need RU in master?

SrinivasMushnoori commented 5 years ago

I do not specifically need it to be in master, a fix in devel is more than fine, happy to work with that,

mturilli commented 5 years ago

The issues seems to be an error in how RP saves the pilot ID in the units json. from printing json['unit']:

[{
[...]
u'description': {
[...]
    u'pilot': u'',
[...]
}
[...]
u'pilot': u'agent_0.executing.0.child.func_exec.0000.000',
[...]
}]

@andre-merzky, I went down the prof_utils.py->utils/session.py->db_utils.py but I did not find the culprit. I am wondering whether the issue is in populating mongodb at runtime?

andre-merzky commented 5 years ago

Hi @SrinivasMushnoori , @mturilli ,

some of the profile entries had the function executor ID set as pilot ID, and the profile parser could not find any information about that 'pilot', and barfed. That is fixed now in the RP branch rct/comm, which should not create correct profiles. For your session, all it needs is to replace the string agent_0.executing.0.child.func_exec.0000.000 with pilot.0000 in the json file, and the profiles become usable by RA. It would be best to rerun the test thought to get clean profile events.

andre-merzky commented 5 years ago

That is fixed now in the RP branch rct/comm, which should not create correct profiles.

not -> now :-P

SrinivasMushnoori commented 5 years ago

Thanks Andre. I'll test and get back to you.