radical-collaboration / CyberManufacturing

CDSE Multi-scale CI Project
1 stars 0 forks source link

Profiling rp #34

Open csampat opened 6 years ago

csampat commented 6 years ago

When I run the python script from the wiki in each test case I get the following error:

Traceback (most recent call last):
  File "../collecting_rp_times.py", line 12, in <module>
    session = ra.Session(sid=sid, stype='radical.pilot', src=src)
  File "/home/chai/anaconda2/lib/python2.7/site-packages/radical/analytics/session.py", line 79, in __init__
    = rp.utils.get_session_profile(sid=sid, src=self._src)
  File "/home/chai/anaconda2/lib/python2.7/site-packages/radical/pilot/utils/prof_utils.py", line 81, in get_session_profile
    efilter = {ru.EVENT : ['publish', 'work start', 'work done'], 
AttributeError: 'module' object has no attribute 'EVENT'

sid and src have correct session names

csampat commented 6 years ago

Was an issue with my stack.. Got it to work One typo in the wiki page, it has to be uid = sid, I will edit that

All my timings that are printed are coming 0 for some reason

iparask commented 6 years ago

Okay! Can you check the *.prof files if they have anything in them?

csampat commented 6 years ago

Here is an example of a session folder:

(rp_2) chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM128PBM2_1$ cat diam200DEM128PBM2_1.prof 
#time,comp,uid,state,event,msg
1509898857.3511,sync_abs,diam200DEM128PBM2_1,MainThread,,,xcalibur:127.0.0.1:1509898857.34:1509898857.35:ntp
1509898857.3512,config_parser_start,diam200DEM128PBM2_1,MainThread,diam200DEM128PBM2_1,,
1509898857.4621,config_parser_stop,diam200DEM128PBM2_1,MainThread,diam200DEM128PBM2_1,,
1509948252.9552,session_close,diam200DEM128PBM2_1,MainThread,diam200DEM128PBM2_1,,
1509948265.2251,session_stop,diam200DEM128PBM2_1,MainThread,diam200DEM128PBM2_1,,
1509948265.2252,END,diam200DEM128PBM2_1,MainThread,,,
iparask commented 6 years ago

How about the rest? Can you do ls -all *.prof and ls -all pilot.0000/*.prof and put here the output?

csampat commented 6 years ago

for ls -all *.prof the output is:

-rw-r--r-- 1 chai chai  1087 Nov  6 01:04 pmgr.0000.launching.0.child.prof
-rw-r--r-- 1 chai chai   446 Nov  6 01:04 pmgr.0000.launching.0.prof
-rw-r--r-- 1 chai chai  1222 Nov  6 01:04 pmgr.0000.prof
-rw-r--r-- 1 chai chai  8487 Nov  6 01:04 umgr.0000.prof
-rw-r--r-- 1 chai chai  7312 Nov  6 01:04 umgr.0000.scheduling.0.child.prof
-rw-r--r-- 1 chai chai   453 Nov  6 01:04 umgr.0000.scheduling.0.prof
-rw-r--r-- 1 chai chai  7283 Nov  6 01:04 umgr.0000.staging.input.0.child.prof
-rw-r--r-- 1 chai chai   471 Nov  6 01:04 umgr.0000.staging.input.0.prof
-rw-r--r-- 1 chai chai  6825 Nov  6 01:04 umgr.0000.staging.output.0.child.prof
-rw-r--r-- 1 chai chai   477 Nov  6 01:04 umgr.0000.staging.output.0.prof
-rw-r--r-- 1 chai chai 23905 Nov  6 01:04 update.0.child.prof
-rw-r--r-- 1 chai chai   369 Nov  6 01:04 update.0.prof

and for ls -all pilot.0000/*.prof, the output is:

-rw------- 1 chai chai   285 Nov  5 11:23 pilot.0000/agent_0.executing.0.prof
-rw------- 1 chai chai  5532 Nov  6 01:04 pilot.0000/agent_0.prof
-rw------- 1 chai chai 11767 Nov  6 01:04 pilot.0000/agent_0.scheduling.0.child.prof
-rw------- 1 chai chai   467 Nov  6 01:04 pilot.0000/agent_0.scheduling.0.prof
-rw------- 1 chai chai 19129 Nov  6 01:04 pilot.0000/agent_0.staging.input.0.child.prof
-rw------- 1 chai chai   297 Nov  5 11:23 pilot.0000/agent_0.staging.input.0.prof
-rw------- 1 chai chai 19963 Nov  6 01:04 pilot.0000/agent_0.staging.output.0.child.prof
-rw------- 1 chai chai   491 Nov  6 01:04 pilot.0000/agent_0.staging.output.0.prof
-rw------- 1 chai chai  1366 Nov  6 01:04 pilot.0000/bootstrap_1.prof
-rw------- 1 chai chai 24975 Nov  6 01:04 pilot.0000/update.0.child.prof
-rw------- 1 chai chai   250 Nov  5 11:23 pilot.0000/update.0.prof 
iparask commented 6 years ago

Please create a tar ball of a session folder and attach it here along with the stack

csampat commented 6 years ago

the stack:

  python               : 2.7.13
  pythonpath           : 
  virtualenv           : /home/chai/Documents/git/rp_fix/src/RADICAL_Pilot/rp_2

  radical.analytics    : v0.45.2-86-g99480a1@rc-v0.46.3
  radical.pilot        : 0.47-v0.46.2-186-g2648ca47@experiment-cybermanufacturing
  radical.utils        : 0.47-v0.46-73-gd580ab1@rc-v0.46.3
  saga                 : 0.47-v0.46-32-ga2f9dedc@rc-v0.46.3

diam200DEM64PBM2_1.tar.gz

iparask commented 6 years ago

:) I had an error in the wiki page. Here is the units duration: 49851.5770001 in seconds

csampat commented 6 years ago

I tried running it again but still the same issue:

(rp_2) chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot$ for i in diam200DEM128PBM1_1 diam200DEM128PBM2_1 diam200DEM64PBM16_1 diam200DEM64PBM2_1 diam200DEM64PBM4_1 diam200DEM64PBM8_1; do cd $i; radicalpilot-close-session -m export -s $i; radicalpilot-fetch-profiles $i -s;cd ..;done
modes   : export
db url  : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM128PBM1_1
age     : -999999999 days, 0:00:00
check  session diam200DEM128PBM1_1 + (3 days, 6:40:39.673634)
export session diam200DEM128PBM1_1.json
modes   : export
db url  : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM128PBM2_1
age     : -999999999 days, 0:00:00
check  session diam200DEM128PBM2_1 + (2 days, 19:53:19.842844)
export session diam200DEM128PBM2_1.json
modes   : export
db url  : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM64PBM16_1
age     : -999999999 days, 0:00:00
check  session diam200DEM64PBM16_1 + (3 days, 6:39:54.515589)
export session diam200DEM64PBM16_1.json
modes   : export
db url  : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM64PBM2_1
age     : -999999999 days, 0:00:00
check  session diam200DEM64PBM2_1 + (3 days, 20:35:47.321097)
export session diam200DEM64PBM2_1.json
modes   : export
db url  : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM64PBM4_1
age     : -999999999 days, 0:00:00
check  session diam200DEM64PBM4_1 + (3 days, 15:43:50.598075)
export session diam200DEM64PBM4_1.json
modes   : export
db url  : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM64PBM8_1
age     : -999999999 days, 0:00:00
check  session diam200DEM64PBM8_1 + (3 days, 15:43:17.024103)
export session diam200DEM64PBM8_1.json
for i in diam200DEM128PBM1_1 diam200DEM128PBM2_1 diam200DEM64PBM16_1 diam200DEM64PBM2_1 diam200DEM64PBM4_1 diam200DEM64PBM8_1 diam200DEM64PBM1_1; do cd $i;echo $i;python ../collecting_rp_times.py $i $i;cd ..;done
diam200DEM128PBM1_1
0
diam200DEM128PBM2_1
0
diam200DEM64PBM16_1
0
diam200DEM64PBM2_1
0
diam200DEM64PBM4_1
0
diam200DEM64PBM8_1
0
diam200DEM64PBM1_1
0

I even tried running them individually in each folder but still got the timing as 0.

iparask commented 6 years ago

You are using a python file, right? Can you upload it somewhere for me to see it?

iparask commented 6 years ago

Also do you get any warnings with invalid rows?

csampat commented 6 years ago

Nop no warnings for the invalid rowws collecting_rp_times.txt

iparask commented 6 years ago

Can you do an ls in this folder diam200DEM64PBM2_1?

csampat commented 6 years ago
chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM64PBM2_1$ ls
control.pubsub.bridge.0000.child.log  log.pubsub.pub.0006.log                     state.pubsub.sub.0000.log                     umgr.reschedule.pubsub.bridge.0000.log
control.pubsub.bridge.0000.log        pilot.0000                                  state.pubsub.sub.0001.log                     umgr.scheduling.queue.bridge.0000.child.log
control.pubsub.pub.0000.log           pmgr.0000.launching.0.child.err             umgr.0000.log                                 umgr.scheduling.queue.bridge.0000.log
control.pubsub.pub.0001.log           pmgr.0000.launching.0.child.log             umgr.0000.prof                                umgr.scheduling.queue.input.0000.log
control.pubsub.pub.0002.log           pmgr.0000.launching.0.child.out             umgr.0000.scheduling.0.child.err              umgr.scheduling.queue.output.0000.log
control.pubsub.pub.0003.log           pmgr.0000.launching.0.child.prof            umgr.0000.scheduling.0.child.log              umgr.staging.input.queue.bridge.0000.child.log
control.pubsub.pub.0004.log           pmgr.0000.launching.0.log                   umgr.0000.scheduling.0.child.out              umgr.staging.input.queue.bridge.0000.log
control.pubsub.pub.0005.log           pmgr.0000.launching.0.prof                  umgr.0000.scheduling.0.child.prof             umgr.staging.input.queue.input.0000.log
control.pubsub.pub.0006.log           pmgr.0000.log                               umgr.0000.scheduling.0.log                    umgr.staging.input.queue.output.0000.log
control.pubsub.sub.0000.log           pmgr.0000.prof                              umgr.0000.scheduling.0.prof                   umgr.staging.output.queue.bridge.0000.child.log
control.pubsub.sub.0001.log           pmgr.launching.queue.bridge.0000.child.log  umgr.0000.staging.input.0.child.err           umgr.staging.output.queue.bridge.0000.log
diam200DEM64PBM2_1                    pmgr.launching.queue.bridge.0000.log        umgr.0000.staging.input.0.child.log           umgr.staging.output.queue.input.0000.log
diam200DEM64PBM2_1.json               pmgr.launching.queue.input.0000.log         umgr.0000.staging.input.0.child.out           umgr.staging.output.queue.output.0000.log
diam200DEM64PBM2_1.log                pmgr.launching.queue.output.0000.log        umgr.0000.staging.input.0.child.prof          umgr.unschedule.pubsub.bridge.0000.child.log
diam200DEM64PBM2_1.prof               state.pubsub.bridge.0000.child.log          umgr.0000.staging.input.0.log                 umgr.unschedule.pubsub.bridge.0000.log
log.pubsub.bridge.0000.child.log      state.pubsub.bridge.0000.log                umgr.0000.staging.input.0.prof                update.0.child.err
log.pubsub.bridge.0000.log            state.pubsub.pub.0000.log                   umgr.0000.staging.output.0.child.err          update.0.child.log
log.pubsub.pub.0000.log               state.pubsub.pub.0001.log                   umgr.0000.staging.output.0.child.log          update.0.child.out
log.pubsub.pub.0001.log               state.pubsub.pub.0002.log                   umgr.0000.staging.output.0.child.out          update.0.child.prof
log.pubsub.pub.0002.log               state.pubsub.pub.0003.log                   umgr.0000.staging.output.0.child.prof         update.0.log
log.pubsub.pub.0003.log               state.pubsub.pub.0004.log                   umgr.0000.staging.output.0.log                update.0.prof
log.pubsub.pub.0004.log               state.pubsub.pub.0005.log                   umgr.0000.staging.output.0.prof
log.pubsub.pub.0005.log               state.pubsub.pub.0006.log                   umgr.reschedule.pubsub.bridge.0000.child.log
chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM64PBM2_1$
iparask commented 6 years ago
diam200DEM64PBM2_1
0

This is not correct. I have the time of this session in this ticket. I also got it with your script. I think that the way you pass the paths is creating your problem.

csampat commented 6 years ago

okay I will try running them individually again once

csampat commented 6 years ago

Getting an attribute error when I run it in ipython

In [9]: session = ra.Session(sid=sid, stype='radical.pilot', src=src)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-8c5c856c5b98> in <module>()
----> 1 session = ra.Session(sid=sid, stype='radical.pilot', src=src)

/home/chai/anaconda2/lib/python2.7/site-packages/radical/analytics/session.pyc in __init__(self, src, stype, sid, _entities, _init)
     77             import radical.pilot as rp
     78             self._profile, accuracy, hostmap \
---> 79                               = rp.utils.get_session_profile(sid=sid, src=self._src)
     80             self._description = rp.utils.get_session_description(sid=sid, src=self._src)
     81 

/home/chai/anaconda2/lib/python2.7/site-packages/radical/pilot/utils/prof_utils.pyc in get_session_profile(sid, src)
     79 
     80     #  filter out some frequent, but uninteresting events
---> 81     efilter = {ru.EVENT : ['publish', 'work start', 'work done'], 
     82                ru.MSG   : ['update unit state', 'unit update pushed', 
     83                             'bulked', 'bulk size']

AttributeError: 'module' object has no attribute 'EVENT'
iparask commented 6 years ago

Is this from the session you shared with me? If not can you please try it with the session you share here?

csampat commented 6 years ago

same issue in that folder as well

In [3]: import radical.analytics as ra
   ...: 

In [4]: import radical.utils as ru
   ...: 

In [5]: sid = 'diam200DEM64PBM2_1'

In [6]: src = 'diam200DEM64PBM2_1'

In [7]: session = ra.Session(sid=sid, stype='radical.pilot', src=src)
   ...: 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-8c5c856c5b98> in <module>()
----> 1 session = ra.Session(sid=sid, stype='radical.pilot', src=src)

/home/chai/anaconda2/lib/python2.7/site-packages/radical/analytics/session.pyc in __init__(self, src, stype, sid, _entities, _init)
     77             import radical.pilot as rp
     78             self._profile, accuracy, hostmap \
---> 79                               = rp.utils.get_session_profile(sid=sid, src=self._src)
     80             self._description = rp.utils.get_session_description(sid=sid, src=self._src)
     81 

/home/chai/anaconda2/lib/python2.7/site-packages/radical/pilot/utils/prof_utils.pyc in get_session_profile(sid, src)
     79 
     80     #  filter out some frequent, but uninteresting events
---> 81     efilter = {ru.EVENT : ['publish', 'work start', 'work done'], 
     82                ru.MSG   : ['update unit state', 'unit update pushed', 
     83                             'bulked', 'bulk size']

AttributeError: 'module' object has no attribute 'EVENT'

In [8]: 
iparask commented 6 years ago

This is very weird..... I am proposing a very ugly workaround for now. In each session folder, in the pilot.0000 folder there should be a file named update.0.child.prof. In that file, you should find the following lines:

1509723556.1985,update.0.child:update.0.child.subscriber._state_cb,unit.000000,,update_request,AGENT_EXECUTING

and

1509772730.7001,update.0.child:update.0.child.subscriber._state_cb,unit.000010,,update_request,UMGR_STAGING_OUTPUT_PENDING

In the first the number is the time, in epoch, the first unit started, in the second is the time the last finished. If you subtract them you should get your times

csampat commented 6 years ago

Okay I found out the times manually. Can you please send / upload the timings of your experiments as well

andre-merzky commented 6 years ago

ru.Event is defined in the rc/v0.46.3 branch in radical.utils: https://github.com/radical-cybertools/radical.utils/blob/rc/v0.46.3/src/radical/utils/profile.py#L23 - please make sure you use that branch.

csampat commented 6 years ago

Thank you !@andre-merzky But if you see my radical-stack above, my radical-utils is using the rc/v0.46.3 branch radical.utils : 0.47-v0.46-73-gd580ab1@rc-v0.46.3

andre-merzky commented 6 years ago

Yes, I saw that, but had assumed that this possibly changed meanwhile. Point remains that this should be defined in this version:

$ [rc/v0.46.3] $ radical-stack | grep utils
  radical.utils        : 0.47-v0.46-73-gd580ab1@rc-v0.46.3

$  [rc/v0.46.3] $ python -c 'import radical.utils as ru; print ru.EVENT'
1

So if that is still a problem, please try to run the above two commands, and from there we'll figure out whats up. Thanks for your patience!

Best, Andre.

csampat commented 6 years ago

The output of the above 2 commands are :

(rp_2) chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM64PBM2_1$ radical-stack | grep utils
  radical.utils        : 0.47-v0.46-73-gd580ab1@rc-v0.46.3
(rp_2) chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM64PBM2_1$ python -c 'import radical.utils as ru; print ru.EVENT'
1

I don't know if this helps, since it is the same output as you expected.

andre-merzky commented 6 years ago

It does - it confirms that the install worked ok, and the version report is correct. So the problem likely lies on how that ve / that module is used.

The next step would be to check /home/chai/anaconda2/lib/python2.7/site-packages/radical/pilot/utils/prof_utils.py (with the full path) if import radical.utils as ru is indeed in that file's header. If that is missing, we need to update radical.pilot. If that is the case, we would need to debug why that import gives a different result than your check on the command line.

@iparask, can you please help out tracking this down? Thanks!