Open csampat opened 6 years ago
Was an issue with my stack.. Got it to work One typo in the wiki page, it has to be uid = sid, I will edit that
All my timings that are printed are coming 0 for some reason
Okay! Can you check the *.prof files if they have anything in them?
Here is an example of a session folder:
(rp_2) chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM128PBM2_1$ cat diam200DEM128PBM2_1.prof
#time,comp,uid,state,event,msg
1509898857.3511,sync_abs,diam200DEM128PBM2_1,MainThread,,,xcalibur:127.0.0.1:1509898857.34:1509898857.35:ntp
1509898857.3512,config_parser_start,diam200DEM128PBM2_1,MainThread,diam200DEM128PBM2_1,,
1509898857.4621,config_parser_stop,diam200DEM128PBM2_1,MainThread,diam200DEM128PBM2_1,,
1509948252.9552,session_close,diam200DEM128PBM2_1,MainThread,diam200DEM128PBM2_1,,
1509948265.2251,session_stop,diam200DEM128PBM2_1,MainThread,diam200DEM128PBM2_1,,
1509948265.2252,END,diam200DEM128PBM2_1,MainThread,,,
How about the rest? Can you do ls -all *.prof
and ls -all pilot.0000/*.prof
and put here the output?
for ls -all *.prof
the output is:
-rw-r--r-- 1 chai chai 1087 Nov 6 01:04 pmgr.0000.launching.0.child.prof
-rw-r--r-- 1 chai chai 446 Nov 6 01:04 pmgr.0000.launching.0.prof
-rw-r--r-- 1 chai chai 1222 Nov 6 01:04 pmgr.0000.prof
-rw-r--r-- 1 chai chai 8487 Nov 6 01:04 umgr.0000.prof
-rw-r--r-- 1 chai chai 7312 Nov 6 01:04 umgr.0000.scheduling.0.child.prof
-rw-r--r-- 1 chai chai 453 Nov 6 01:04 umgr.0000.scheduling.0.prof
-rw-r--r-- 1 chai chai 7283 Nov 6 01:04 umgr.0000.staging.input.0.child.prof
-rw-r--r-- 1 chai chai 471 Nov 6 01:04 umgr.0000.staging.input.0.prof
-rw-r--r-- 1 chai chai 6825 Nov 6 01:04 umgr.0000.staging.output.0.child.prof
-rw-r--r-- 1 chai chai 477 Nov 6 01:04 umgr.0000.staging.output.0.prof
-rw-r--r-- 1 chai chai 23905 Nov 6 01:04 update.0.child.prof
-rw-r--r-- 1 chai chai 369 Nov 6 01:04 update.0.prof
and for ls -all pilot.0000/*.prof
, the output is:
-rw------- 1 chai chai 285 Nov 5 11:23 pilot.0000/agent_0.executing.0.prof
-rw------- 1 chai chai 5532 Nov 6 01:04 pilot.0000/agent_0.prof
-rw------- 1 chai chai 11767 Nov 6 01:04 pilot.0000/agent_0.scheduling.0.child.prof
-rw------- 1 chai chai 467 Nov 6 01:04 pilot.0000/agent_0.scheduling.0.prof
-rw------- 1 chai chai 19129 Nov 6 01:04 pilot.0000/agent_0.staging.input.0.child.prof
-rw------- 1 chai chai 297 Nov 5 11:23 pilot.0000/agent_0.staging.input.0.prof
-rw------- 1 chai chai 19963 Nov 6 01:04 pilot.0000/agent_0.staging.output.0.child.prof
-rw------- 1 chai chai 491 Nov 6 01:04 pilot.0000/agent_0.staging.output.0.prof
-rw------- 1 chai chai 1366 Nov 6 01:04 pilot.0000/bootstrap_1.prof
-rw------- 1 chai chai 24975 Nov 6 01:04 pilot.0000/update.0.child.prof
-rw------- 1 chai chai 250 Nov 5 11:23 pilot.0000/update.0.prof
Please create a tar ball of a session folder and attach it here along with the stack
the stack:
python : 2.7.13
pythonpath :
virtualenv : /home/chai/Documents/git/rp_fix/src/RADICAL_Pilot/rp_2
radical.analytics : v0.45.2-86-g99480a1@rc-v0.46.3
radical.pilot : 0.47-v0.46.2-186-g2648ca47@experiment-cybermanufacturing
radical.utils : 0.47-v0.46-73-gd580ab1@rc-v0.46.3
saga : 0.47-v0.46-32-ga2f9dedc@rc-v0.46.3
:) I had an error in the wiki page. Here is the units duration: 49851.5770001 in seconds
I tried running it again but still the same issue:
(rp_2) chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot$ for i in diam200DEM128PBM1_1 diam200DEM128PBM2_1 diam200DEM64PBM16_1 diam200DEM64PBM2_1 diam200DEM64PBM4_1 diam200DEM64PBM8_1; do cd $i; radicalpilot-close-session -m export -s $i; radicalpilot-fetch-profiles $i -s;cd ..;done
modes : export
db url : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM128PBM1_1
age : -999999999 days, 0:00:00
check session diam200DEM128PBM1_1 + (3 days, 6:40:39.673634)
export session diam200DEM128PBM1_1.json
modes : export
db url : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM128PBM2_1
age : -999999999 days, 0:00:00
check session diam200DEM128PBM2_1 + (2 days, 19:53:19.842844)
export session diam200DEM128PBM2_1.json
modes : export
db url : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM64PBM16_1
age : -999999999 days, 0:00:00
check session diam200DEM64PBM16_1 + (3 days, 6:39:54.515589)
export session diam200DEM64PBM16_1.json
modes : export
db url : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM64PBM2_1
age : -999999999 days, 0:00:00
check session diam200DEM64PBM2_1 + (3 days, 20:35:47.321097)
export session diam200DEM64PBM2_1.json
modes : export
db url : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM64PBM4_1
age : -999999999 days, 0:00:00
check session diam200DEM64PBM4_1 + (3 days, 15:43:50.598075)
export session diam200DEM64PBM4_1.json
modes : export
db url : mongodb://chai:qwerty123@ds135364.mlab.com:35364/one_way_rp_studies
session : diam200DEM64PBM8_1
age : -999999999 days, 0:00:00
check session diam200DEM64PBM8_1 + (3 days, 15:43:17.024103)
export session diam200DEM64PBM8_1.json
for i in diam200DEM128PBM1_1 diam200DEM128PBM2_1 diam200DEM64PBM16_1 diam200DEM64PBM2_1 diam200DEM64PBM4_1 diam200DEM64PBM8_1 diam200DEM64PBM1_1; do cd $i;echo $i;python ../collecting_rp_times.py $i $i;cd ..;done
diam200DEM128PBM1_1
0
diam200DEM128PBM2_1
0
diam200DEM64PBM16_1
0
diam200DEM64PBM2_1
0
diam200DEM64PBM4_1
0
diam200DEM64PBM8_1
0
diam200DEM64PBM1_1
0
I even tried running them individually in each folder but still got the timing as 0.
You are using a python file, right? Can you upload it somewhere for me to see it?
Also do you get any warnings with invalid rows?
Nop no warnings for the invalid rowws collecting_rp_times.txt
Can you do an ls
in this folder diam200DEM64PBM2_1
?
chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM64PBM2_1$ ls
control.pubsub.bridge.0000.child.log log.pubsub.pub.0006.log state.pubsub.sub.0000.log umgr.reschedule.pubsub.bridge.0000.log
control.pubsub.bridge.0000.log pilot.0000 state.pubsub.sub.0001.log umgr.scheduling.queue.bridge.0000.child.log
control.pubsub.pub.0000.log pmgr.0000.launching.0.child.err umgr.0000.log umgr.scheduling.queue.bridge.0000.log
control.pubsub.pub.0001.log pmgr.0000.launching.0.child.log umgr.0000.prof umgr.scheduling.queue.input.0000.log
control.pubsub.pub.0002.log pmgr.0000.launching.0.child.out umgr.0000.scheduling.0.child.err umgr.scheduling.queue.output.0000.log
control.pubsub.pub.0003.log pmgr.0000.launching.0.child.prof umgr.0000.scheduling.0.child.log umgr.staging.input.queue.bridge.0000.child.log
control.pubsub.pub.0004.log pmgr.0000.launching.0.log umgr.0000.scheduling.0.child.out umgr.staging.input.queue.bridge.0000.log
control.pubsub.pub.0005.log pmgr.0000.launching.0.prof umgr.0000.scheduling.0.child.prof umgr.staging.input.queue.input.0000.log
control.pubsub.pub.0006.log pmgr.0000.log umgr.0000.scheduling.0.log umgr.staging.input.queue.output.0000.log
control.pubsub.sub.0000.log pmgr.0000.prof umgr.0000.scheduling.0.prof umgr.staging.output.queue.bridge.0000.child.log
control.pubsub.sub.0001.log pmgr.launching.queue.bridge.0000.child.log umgr.0000.staging.input.0.child.err umgr.staging.output.queue.bridge.0000.log
diam200DEM64PBM2_1 pmgr.launching.queue.bridge.0000.log umgr.0000.staging.input.0.child.log umgr.staging.output.queue.input.0000.log
diam200DEM64PBM2_1.json pmgr.launching.queue.input.0000.log umgr.0000.staging.input.0.child.out umgr.staging.output.queue.output.0000.log
diam200DEM64PBM2_1.log pmgr.launching.queue.output.0000.log umgr.0000.staging.input.0.child.prof umgr.unschedule.pubsub.bridge.0000.child.log
diam200DEM64PBM2_1.prof state.pubsub.bridge.0000.child.log umgr.0000.staging.input.0.log umgr.unschedule.pubsub.bridge.0000.log
log.pubsub.bridge.0000.child.log state.pubsub.bridge.0000.log umgr.0000.staging.input.0.prof update.0.child.err
log.pubsub.bridge.0000.log state.pubsub.pub.0000.log umgr.0000.staging.output.0.child.err update.0.child.log
log.pubsub.pub.0000.log state.pubsub.pub.0001.log umgr.0000.staging.output.0.child.log update.0.child.out
log.pubsub.pub.0001.log state.pubsub.pub.0002.log umgr.0000.staging.output.0.child.out update.0.child.prof
log.pubsub.pub.0002.log state.pubsub.pub.0003.log umgr.0000.staging.output.0.child.prof update.0.log
log.pubsub.pub.0003.log state.pubsub.pub.0004.log umgr.0000.staging.output.0.log update.0.prof
log.pubsub.pub.0004.log state.pubsub.pub.0005.log umgr.0000.staging.output.0.prof
log.pubsub.pub.0005.log state.pubsub.pub.0006.log umgr.reschedule.pubsub.bridge.0000.child.log
chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM64PBM2_1$
diam200DEM64PBM2_1
0
This is not correct. I have the time of this session in this ticket. I also got it with your script. I think that the way you pass the paths is creating your problem.
okay I will try running them individually again once
Getting an attribute error when I run it in ipython
In [9]: session = ra.Session(sid=sid, stype='radical.pilot', src=src)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-8c5c856c5b98> in <module>()
----> 1 session = ra.Session(sid=sid, stype='radical.pilot', src=src)
/home/chai/anaconda2/lib/python2.7/site-packages/radical/analytics/session.pyc in __init__(self, src, stype, sid, _entities, _init)
77 import radical.pilot as rp
78 self._profile, accuracy, hostmap \
---> 79 = rp.utils.get_session_profile(sid=sid, src=self._src)
80 self._description = rp.utils.get_session_description(sid=sid, src=self._src)
81
/home/chai/anaconda2/lib/python2.7/site-packages/radical/pilot/utils/prof_utils.pyc in get_session_profile(sid, src)
79
80 # filter out some frequent, but uninteresting events
---> 81 efilter = {ru.EVENT : ['publish', 'work start', 'work done'],
82 ru.MSG : ['update unit state', 'unit update pushed',
83 'bulked', 'bulk size']
AttributeError: 'module' object has no attribute 'EVENT'
Is this from the session you shared with me? If not can you please try it with the session you share here?
same issue in that folder as well
In [3]: import radical.analytics as ra
...:
In [4]: import radical.utils as ru
...:
In [5]: sid = 'diam200DEM64PBM2_1'
In [6]: src = 'diam200DEM64PBM2_1'
In [7]: session = ra.Session(sid=sid, stype='radical.pilot', src=src)
...:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-8c5c856c5b98> in <module>()
----> 1 session = ra.Session(sid=sid, stype='radical.pilot', src=src)
/home/chai/anaconda2/lib/python2.7/site-packages/radical/analytics/session.pyc in __init__(self, src, stype, sid, _entities, _init)
77 import radical.pilot as rp
78 self._profile, accuracy, hostmap \
---> 79 = rp.utils.get_session_profile(sid=sid, src=self._src)
80 self._description = rp.utils.get_session_description(sid=sid, src=self._src)
81
/home/chai/anaconda2/lib/python2.7/site-packages/radical/pilot/utils/prof_utils.pyc in get_session_profile(sid, src)
79
80 # filter out some frequent, but uninteresting events
---> 81 efilter = {ru.EVENT : ['publish', 'work start', 'work done'],
82 ru.MSG : ['update unit state', 'unit update pushed',
83 'bulked', 'bulk size']
AttributeError: 'module' object has no attribute 'EVENT'
In [8]:
This is very weird..... I am proposing a very ugly workaround for now. In each session folder, in the pilot.0000 folder there should be a file named update.0.child.prof
. In that file, you should find the following lines:
1509723556.1985,update.0.child:update.0.child.subscriber._state_cb,unit.000000,,update_request,AGENT_EXECUTING
and
1509772730.7001,update.0.child:update.0.child.subscriber._state_cb,unit.000010,,update_request,UMGR_STAGING_OUTPUT_PENDING
In the first the number is the time, in epoch, the first unit started, in the second is the time the last finished. If you subtract them you should get your times
Okay I found out the times manually. Can you please send / upload the timings of your experiments as well
ru.Event
is defined in the rc/v0.46.3
branch in radical.utils
: https://github.com/radical-cybertools/radical.utils/blob/rc/v0.46.3/src/radical/utils/profile.py#L23 - please make sure you use that branch.
Thank you !@andre-merzky
But if you see my radical-stack
above, my radical-utils
is using the rc/v0.46.3
branch
radical.utils : 0.47-v0.46-73-gd580ab1@rc-v0.46.3
Yes, I saw that, but had assumed that this possibly changed meanwhile. Point remains that this should be defined in this version:
$ [rc/v0.46.3] $ radical-stack | grep utils
radical.utils : 0.47-v0.46-73-gd580ab1@rc-v0.46.3
$ [rc/v0.46.3] $ python -c 'import radical.utils as ru; print ru.EVENT'
1
So if that is still a problem, please try to run the above two commands, and from there we'll figure out whats up. Thanks for your patience!
Best, Andre.
The output of the above 2 commands are :
(rp_2) chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM64PBM2_1$ radical-stack | grep utils
radical.utils : 0.47-v0.46-73-gd580ab1@rc-v0.46.3
(rp_2) chai@xcalibur:~/Documents/git/rp_fix/src/RADICAL_Pilot/diam200DEM64PBM2_1$ python -c 'import radical.utils as ru; print ru.EVENT'
1
I don't know if this helps, since it is the same output as you expected.
It does - it confirms that the install worked ok, and the version report is correct. So the problem likely lies on how that ve / that module is used.
The next step would be to check /home/chai/anaconda2/lib/python2.7/site-packages/radical/pilot/utils/prof_utils.py
(with the full path) if import radical.utils as ru
is indeed in that file's header. If that is missing, we need to update radical.pilot. If that is the case, we would need to debug why that import gives a different result than your check on the command line.
@iparask, can you please help out tracking this down? Thanks!
When I run the python script from the wiki in each test case I get the following error:
sid and src have correct session names