Open GoogleCodeExporter opened 9 years ago
Profiling yappi.get_func_stats reveal this:
v367:
Wed Jan 7 19:44:57 2015 v367/yappi-512.pstat
540077 function calls in 7.762 seconds
Ordered by: internal time
List reduced from 35 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
151358 3.377 0.000 3.426 0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
7984 1.318 0.000 4.041 0.001 /home/nsoffer/src/yappi/yappi.py:246(__init__)
13921 0.903 0.000 0.905 0.000 /home/nsoffer/src/yappi/yappi.py:488(__getitem__)
7928 0.631 0.000 1.384 0.000 /home/nsoffer/src/yappi/yappi.py:269(__add__)
1 0.364 0.364 7.761 7.761 /home/nsoffer/src/yappi/yappi.py:508(get)
7956 0.305 0.000 6.464 0.001 /home/nsoffer/src/yappi/yappi.py:539(_enumerator)
7956 0.257 0.000 0.257 0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
7983 0.253 0.000 1.709 0.000 /home/nsoffer/src/yappi/yappi.py:401(append)
7956 0.133 0.000 0.137 0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
36049 0.072 0.000 0.072 0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
103736 0.067 0.000 2.723 0.000 {setattr}
151358 0.048 0.000 0.048 0.000 {method 'index' of 'tuple' objects}
1 0.025 0.025 6.490 6.490 {_yappi.enum_func_stats}
7956 0.004 0.000 0.004 0.000 {method 'rfind' of 'str' objects}
13922 0.002 0.000 0.002 0.000 {isinstance}
13929 0.002 0.000 0.002 0.000 {function append at 0x1940848}
28 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:377(__init__)
2 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:693(sort)
2 0.000 0.000 0.000 0.000 {function sort at 0x19411b8}
27 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:385(<lambda>)
v366:
Wed Jan 7 19:38:01 2015 v366/yappi-512.pstat
2910 function calls in 0.067 seconds
Ordered by: internal time
List reduced from 34 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
709 0.065 0.000 0.065 0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
655 0.000 0.000 0.065 0.000 {setattr}
57 0.000 0.000 0.066 0.001 /home/nsoffer/src/yappi/yappi.py:246(__init__)
709 0.000 0.000 0.000 0.000 {method 'index' of 'tuple' objects}
56 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:398(append)
1 0.000 0.000 0.067 0.067 /home/nsoffer/src/yappi/yappi.py:506(get)
28 0.000 0.000 0.002 0.000 /home/nsoffer/src/yappi/yappi.py:534(_enumerator)
374 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
48 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:486(__getitem__)
1 0.000 0.000 0.002 0.002 {_yappi.enum_func_stats}
28 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
28 0.000 0.000 0.000 0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
28 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
28 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:374(__init__)
2 0.000 0.000 0.000 0.000 {function sort at 0x27562a8}
56 0.000 0.000 0.000 0.000 {function append at 0x2755938}
2 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:682(sort)
2 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:381(sort)
27 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:382(<lambda>)
49 0.000 0.000 0.000 0.000 {isinstance}
Note 7956 _enumerator calls vs 28.
Attached stress script profiling yappi.get_func_stats
Original comment by nir...@gmail.com
on 7 Jan 2015 at 5:52
Attachments:
I found an error in the stress script - I forgot to clear the stats between the
test runs with different thread count.
These are the results with the fixed script.
v367:
threads: 32
running: 10.012666
stopping yappi: 0.000013
getting stats: 0.140000
saving pstat: 0.012483
threads: 64
running: 10.030343
stopping yappi: 0.000013
getting stats: 0.064788
saving pstat: 0.027382
threads: 128
running: 10.036739
stopping yappi: 0.000019
getting stats: 0.270757
saving pstat: 0.114645
threads: 256
running: 10.078558
stopping yappi: 0.000036
getting stats: 1.449708
saving pstat: 0.086878
threads: 512
running: 10.203625
stopping yappi: 0.000051
getting stats: 6.936577
saving pstat: 0.116690
v366:
threads: 32
running: 10.015445
stopping yappi: 0.000014
getting stats: 0.003525
saving pstat: 0.009566
threads: 64
running: 10.033361
stopping yappi: 0.000013
getting stats: 0.003743
saving pstat: 0.007299
threads: 128
running: 10.044901
stopping yappi: 0.000032
getting stats: 0.003609
saving pstat: 0.028504
threads: 256
running: 10.184238
stopping yappi: 0.000032
getting stats: 0.132134
saving pstat: 0.126637
threads: 512
running: 10.195147
stopping yappi: 0.000050
getting stats: 0.003170
saving pstat: 0.118647
And updated profiles of yappi.get_func_stats:
v367:
Wed Jan 7 20:21:26 2015 v367/yappi-512.pstat
279917 function calls in 6.937 seconds
Ordered by: internal time
List reduced from 35 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
78398 2.656 0.000 2.682 0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
7201 1.789 0.000 1.789 0.000 /home/nsoffer/src/yappi/yappi.py:488(__getitem__)
4144 0.692 0.000 2.923 0.001 /home/nsoffer/src/yappi/yappi.py:246(__init__)
4116 0.665 0.000 5.069 0.001 /home/nsoffer/src/yappi/yappi.py:539(_enumerator)
4088 0.608 0.000 0.969 0.000 /home/nsoffer/src/yappi/yappi.py:269(__add__)
4143 0.186 0.000 1.342 0.000 /home/nsoffer/src/yappi/yappi.py:401(append)
18769 0.186 0.000 0.186 0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
1 0.063 0.063 6.937 6.937 /home/nsoffer/src/yappi/yappi.py:508(get)
53816 0.035 0.000 2.231 0.000 {setattr}
78398 0.026 0.000 0.026 0.000 {method 'index' of 'tuple' objects}
1 0.013 0.013 5.082 5.082 {_yappi.enum_func_stats}
4116 0.009 0.000 0.009 0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
4116 0.003 0.000 0.005 0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
4116 0.002 0.000 0.002 0.000 {method 'rfind' of 'str' objects}
7209 0.001 0.000 0.001 0.000 {function append at 0x7f77a2fcd848}
7202 0.001 0.000 0.001 0.000 {isinstance}
28 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:377(__init__)
2 0.000 0.000 0.000 0.000 {function sort at 0x7f77a2fce1b8}
2 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:384(sort)
27 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:385(<lambda>)
v366:
Wed Jan 7 20:23:17 2015 v366/yappi-512.pstat
2871 function calls in 0.003 seconds
Ordered by: internal time
List reduced from 34 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
698 0.001 0.000 0.001 0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
644 0.000 0.000 0.002 0.000 {setattr}
56 0.000 0.000 0.002 0.000 /home/nsoffer/src/yappi/yappi.py:246(__init__)
698 0.000 0.000 0.000 0.000 {method 'index' of 'tuple' objects}
55 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:398(append)
1 0.000 0.000 0.003 0.003 /home/nsoffer/src/yappi/yappi.py:506(get)
28 0.000 0.000 0.002 0.000 /home/nsoffer/src/yappi/yappi.py:534(_enumerator)
373 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
47 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:486(__getitem__)
1 0.000 0.000 0.002 0.002 {_yappi.enum_func_stats}
28 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
28 0.000 0.000 0.000 0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
28 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
28 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:374(__init__)
2 0.000 0.000 0.000 0.000 {function sort at 0x11882a8}
55 0.000 0.000 0.000 0.000 {function append at 0x1183938}
2 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:682(sort)
2 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:381(sort)
27 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:382(<lambda>)
1 0.000 0.000 0.000 0.000 /home/nsoffer/src/yappi/yappi.py:639(add)
Note 4116 calls to _enumerator vs 28.
Original comment by nir...@gmail.com
on 7 Jan 2015 at 6:32
Attachments:
Looking at the code, I think I understand the issue. Before version 367, we
used to store one _pit item per function, and recorded the function stats and
call level in that pit. Different threads calling same function were recorded
in the same pit item.
In version 367, we store one pit item per function per thread calling this
function, so we can get statistics per function and thread. When using many
threads, we have many more pit items, and enumerating them is much more
expensive. Also, we use much more memory for profiling, limiting the time we
can profile on a system with limited memory, or effecting the performance,
swapping real application pages from memory.
This design is particularly bad for vdsm, because not only we have 100's of
threads, we also create new thread for each request when using jsonrpc
transport. So profiling using new yappi version keeps pit item for each
temporary short living thread.
Since this feature is not needed by all users and have severe performance
regression, I think yappi should enable it only on demand. By default, pit
should be stored in global table. If this feature is enabled, pit will be
stored per context.
Original comment by nir...@gmail.com
on 7 Jan 2015 at 7:14
Hi Nir,
Fabio Zadrozny had done some optimizations regarding post-processing time on
profiler output. See this commit: d09a8a3. Can you please also test with latest
code? get_func_stats() calls YStats.append() which also has O(n) behaviour,
with this patch we will have O(1) behaviour.
So, can you please test with the latest version and see if that works ok? After
that, if the performance is still not enough, we can discuss options, however
per-thread stat filtering is a great feature and I personally do not like to
move it out...
Original comment by sum...@gmail.com
on 9 Jan 2015 at 7:47
I tested again including commit 371 (Fabio optimization), and I don't see
significant difference in timing.
The main issue is having much more pit items, translating to slow enumeration
code in pure Pyhton. I'm not suggesting to remove this feature, only to make
it optional.
Note that in this test I changed the stress script to use sha512 instead
of json encoding/decoding, since this simulates better the type of load
we see in overloaded vdsm process.
v366
====
threads: 32
running: 10.014859
stopping yappi: 0.000015
getting stats: 0.003952
saving pstat: 0.001021
threads: 64
running: 10.018209
stopping yappi: 0.000016
getting stats: 0.009164
saving pstat: 0.002347
threads: 128
running: 10.025956
stopping yappi: 0.000023
getting stats: 0.088673
saving pstat: 0.007464
threads: 256
running: 10.038204
stopping yappi: 0.000031
getting stats: 0.130150
saving pstat: 0.067876
threads: 512
running: 10.068508
stopping yappi: 0.000055
getting stats: 0.132270
saving pstat: 0.052638
Fri Jan 9 10:34:22 2015 v366/yappi-512.pstat
3636 function calls in 0.132 seconds
Ordered by: internal time
List reduced from 34 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
879 0.083 0.000 0.083 0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
71 0.030 0.000 0.110 0.002 /home/nsoffer/src/yappi/yappi.py:246(__init__)
37 0.009 0.000 0.009 0.000 /home/nsoffer/src/yappi/yappi.py:486(__getitem__)
34 0.003 0.000 0.034 0.001 /home/nsoffer/src/yappi/yappi.py:534(_enumerator)
69 0.003 0.000 0.004 0.000 /home/nsoffer/src/yappi/yappi.py:398(append)
1 0.001 0.001 0.001 0.001 /home/nsoffer/src/yappi/yappi.py:639(add)
815 0.001 0.000 0.080 0.000 {setattr}
548 0.001 0.000 0.001 0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
879 0.001 0.000 0.001 0.000 {method 'index' of 'tuple' objects}
1 0.000 0.000 0.131 0.131 /home/nsoffer/src/yappi/yappi.py:506(get)
v367
====
threads: 32
running: 10.021096
stopping yappi: 0.000015
getting stats: 0.015900
saving pstat: 0.001181
threads: 64
running: 10.017866
stopping yappi: 0.000017
getting stats: 0.155157
saving pstat: 0.008110
threads: 128
running: 10.026521
stopping yappi: 0.000021
getting stats: 3.193556
saving pstat: 0.022553
threads: 256
running: 10.043622
stopping yappi: 0.000032
getting stats: 9.977705
saving pstat: 0.080130
threads: 512
running: 10.096404
stopping yappi: 0.000056
getting stats: 59.533383
saving pstat: 0.102317
Fri Jan 9 10:37:16 2015 v367/yappi-512.pstat
165787 function calls in 59.533 seconds
Ordered by: internal time
List reduced from 35 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
49477 31.873 0.001 31.908 0.001 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
2627 12.108 0.005 34.919 0.013 /home/nsoffer/src/yappi/yappi.py:246(__init__)
2590 5.083 0.002 56.855 0.022 /home/nsoffer/src/yappi/yappi.py:539(_enumerator)
2556 4.138 0.002 11.794 0.005 /home/nsoffer/src/yappi/yappi.py:269(__add__)
2081 2.338 0.001 2.338 0.001 /home/nsoffer/src/yappi/yappi.py:488(__getitem__)
8217 2.041 0.000 2.041 0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
2590 0.687 0.000 0.691 0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
2590 0.545 0.000 0.545 0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
2625 0.515 0.000 14.350 0.005 /home/nsoffer/src/yappi/yappi.py:401(append)
1 0.096 0.096 59.533 59.533 /home/nsoffer/src/yappi/yappi.py:508(get)
v371
====
threads: 32
running: 10.018111
stopping yappi: 0.000016
getting stats: 0.031227
saving pstat: 0.021802
threads: 64
running: 10.017729
stopping yappi: 0.000020
getting stats: 0.047200
saving pstat: 0.012259
threads: 128
running: 10.026679
stopping yappi: 0.000023
getting stats: 2.801588
saving pstat: 0.038671
threads: 256
running: 10.045928
stopping yappi: 0.000032
getting stats: 6.875774
saving pstat: 0.052142
threads: 512
running: 10.078698
stopping yappi: 0.000166
getting stats: 50.336383
saving pstat: 0.116318
Fri Jan 9 10:32:07 2015 v371/yappi-512.pstat
286465 function calls in 50.336 seconds
Ordered by: internal time
List reduced from 46 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
82181 26.909 0.000 26.949 0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
4671 11.754 0.003 31.389 0.007 /home/nsoffer/src/yappi/yappi.py:246(__init__)
4669 2.402 0.001 14.454 0.003 /home/nsoffer/src/yappi/yappi.py:474(append)
2044 2.186 0.001 6.086 0.003 /home/nsoffer/src/yappi/yappi.py:336(__add__)
2590 1.465 0.001 20.931 0.008 /home/nsoffer/src/yappi/yappi.py:571(_enumerator)
1 1.339 1.339 50.336 50.336 /home/nsoffer/src/yappi/yappi.py:540(get)
2556 1.307 0.001 3.973 0.002 /home/nsoffer/src/yappi/yappi.py:272(__add__)
4669 0.790 0.000 12.048 0.003 /home/nsoffer/src/yappi/yappi.py:428(append)
4600 0.705 0.000 0.705 0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
2081 0.652 0.000 0.654 0.000 /home/nsoffer/src/yappi/yappi.py:480(__getitem__)
Original comment by nir...@gmail.com
on 9 Jan 2015 at 8:54
Attachments:
I onl;y find time today to look into this. After thinking about this a while
and experimenting with the stress.py a bit more, I agree with you that
applications that have too much short lived threads will be affected by this
path severely. Things will become even more when the called functinos have many
childre as those are also will be caculated per _pit.
Anyway, great investigation as usual!
I am currently thinking about following solution:
- Hold _pit's on both per-thread/global tables.
- When user calls yappi.get_func_stats(filter={"ctx_id": tstat.id}) we will
know is user is interesterd in per-thread profile data. If so, let user
enumerate on per-thread, if not then use global table.
This will add a little bit more memory but I don'like the idea of forcing users
to decide on that type of details? As ean example, I personally do not want
this:
yappi.start(..., per_thread_info=True)
I think it is worth having an extra memory consumption in order to eliminate
this kind of parameter.
Of course, needless to say I am open to your contradicting suggestions/ideas?
Thanks,
Original comment by sum...@gmail.com
on 6 Feb 2015 at 5:44
I think that the solution above is the best:
yappi.start(..., per_thread_info=True)
When run in this mode, keep per-thread stats. Otherwise, keep global stats.
People who need this kind of info, can easily enable this mode, and other users
are not effected.
In this, mode, filtering by ctx_id should raise UnsupportedFilter error, hinting
about per_thread_info=True.
I'm also happy with having per_thread_info=True by default so we will disable
it. But I think the right way to introduce such feature is having it disabled
by default.
I did not measure the additional memory requirements of the per-thread-info. It
is possible that the cost is insignificant and disabling it is not worth the
the effort.
Original comment by nir...@gmail.com
on 8 Feb 2015 at 7:37
I think filtering on ctx_id shall work on all occasions, we should not add a
new parameter to start. Because: What will happen user first run
start(per_thread_info=True) and then stop() and then run false(). We can
probably make this work but we will have to deal with additional complexity
without actually any benefit. If user wants to see per_thread_info, he/she
should be able to get it, but of course enumerating those stats will be
different. This approach seems simpler/cleaner to implement, IMHO.
Original comment by sum...@gmail.com
on 8 Feb 2015 at 9:33
If you want to support multiple runs (start...stop...start...stop), than
profiling parameters should not be set in start() - it does not make any sense.
For example:
start(builtins=True)
stop()
start(builtins=False)
...
stop()
To support this use pattern, we need a profiling session:
profile = yappi.Profile(per_thread_info=True, builtins=True)
profile.start()
profile.pause()
profile.resume()
...
profile.stop()
This user interface is easy for the user and the implementor, the semantics are
very clear.
Original comment by nir...@gmail.com
on 9 Feb 2015 at 3:24
Oh...
Your suggestion will change existing API and break backward imcompatibility.
And I still cannot see any motivation for this. I don't understand what is
confusing retrieving per-thread stats by following:
yappi.get_func_stats(ctx_id=1)
and what is more clearer on:
yappi.Profile(per_thread_info=True...
And PS: merging multiple runs with builtins is somewhat ok, but merging
different runs on different per-thread will not make sense and will add extra
complexity for no reason.
I understand it might be implemented the way you suggested or there are
thousand ways to implement the interface for yappi , but any real benefit?
Original comment by sum...@gmail.com
on 25 Feb 2015 at 10:43
BTW, yappi already support multiple runs by start()/stop() subsequently.
Original comment by sum...@gmail.com
on 25 Feb 2015 at 10:44
Current yappi api is not clear, and leads to conflicting setting that do
not make sense in the same profiling "session".
start() is used for both starting profiling and resuming it after stop()
stop() is used for both stopping profiling and pausing.
Python builtin profilers use enable() and disable(), which are better, because
people tend to expect different semantics from start() and stop().
It is clear that you can enable and disable a profiler multiple times, but
people do not expect that starting and stopping a profiler will keep collecting
data for the same session.
This issue becomes worse by accepting profiling options in start(). As you
found, it is possible to do several start() calls, each with conflicting
settings or settings that do not make sense in the same profiling session.
For example, when you collect profile data in several runs, it is unlikely that
you want to have on run with builtins, and one run without. Or one run with
threads and the other run without threads.
The profiling session api does not have any of these issues. Settings are done
once when you create a profile session. You cannot change these settings until
you stop this session, so you cannot have conflicting options like with threads
and without threads.
This api does not break compatibility, since this is a new layer of
control. You can still use the old functions if you like. It can be built
on top of the lower level yappi.start() and yappi.stop().
The only issue I can see is limiting multiple profiling sessions. It
does not make sense to support creating of two profiling sessions
in the same time.
Here is another api which does not have this issue, and avoid conflicting
session settings:
start(**options) start a session with certain options, clearing previous
session collected data.
pause() pause current session.
resume() resume current session using settings set in start().
stop() stop current session. resume() and pause() will fail
after a session was stopped.
This of course breaks backward compatibility because start will clear
old data.
But lets focus on the topic of disabling per threads collection with the
current api - using an option in start() is good enough, if you keep using
the same options in the next start() calls, which is the only way that make
sense anyway.
Original comment by nir...@gmail.com
on 27 Feb 2015 at 9:39
I completely agree that .enable/.disable and having a different Profile object
have a better API. It is far more cleaner/better design. But: generally when
desiging an API, it seems better practice to have a single way to do something
so I could not find a motivation to provide a second interface just because it
is cleaner. But: of course, this does not mean backward compatibility is
everything. We can break the things if we want. But honestly, I simply do not
find the motivation to do that in our case because the gain is small. I would
rather work on another feature such as "loading stats from PSTAT feature" for
example.
Coming to the topic of the topic of disabling per threads collection with the
current api: I would re-think your suggestion on putting that to start() but I
cannot promise I would do that way, since both have trade-offs. Let's see.
Thanks for all your comments.
Original comment by sum...@gmail.com
on 28 Feb 2015 at 9:52
Original issue reported on code.google.com by
nir...@gmail.com
on 7 Jan 2015 at 2:30Attachments: