Severe performance regression in yappi.get_func_stats since 367:4038cc752506

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Profile overloaded process (running on 24 cores with
   1000-1500% cpu) with 150-170 threads
2. After 5-10 minutes stop profiling and write pstat file
3. getting and writing profile times out after 2 minutes

What is the expected output? What do you see instead?
Profile saved to disk after few seconds.

What version of the product are you using? On what operating system?
RHEL 6.6 + Python 2.6, Fedora 20 + Python 2.7

Please provide any additional information below.

Workaround:
Install yappi 0.93, profile is saved in few seconds

I reproduced this issue using the attached script. The script runs many worker 
threads running cpu bound workload. After 10 seconds stop yappi and write a 
profile.

With version 367:4038cc752506 we get:

threads: 32
running: 10.022463
stopping yappi: 0.000012
getting stats: 0.082793
saving pstat: 0.023648

threads: 64
running: 10.055031
stopping yappi: 0.000013
getting stats: 0.182475
saving pstat: 0.023113

threads: 128
running: 10.040754
stopping yappi: 0.000017
getting stats: 0.750598
saving pstat: 0.057810

threads: 256
running: 10.077180
stopping yappi: 0.000049
getting stats: 4.141527
saving pstat: 0.212228

threads: 512
running: 10.209919
stopping yappi: 0.000057
getting stats: 14.634653
saving pstat: 0.296304

With version 366:817bc5a923f0 we get: 

threads: 32
running: 10.017798
stopping yappi: 0.000010
getting stats: 0.002420
saving pstat: 0.024900

threads: 64
running: 10.026069
stopping yappi: 0.000021
getting stats: 0.002978
saving pstat: 0.016633

threads: 128
running: 10.047665
stopping yappi: 0.000017
getting stats: 0.032635
saving pstat: 0.079303

threads: 256
running: 10.121768
stopping yappi: 0.000031
getting stats: 0.216255
saving pstat: 0.205053

threads: 512
running: 10.193148
stopping yappi: 0.126661
getting stats: 0.002681
saving pstat: 0.420317

If I minimize the sleep (time.sleep(0)) in the worker function:

With version 367:4038cc752506 we get:

threads: 32
running: 10.159304
stopping yappi: 0.000028
getting stats: 1.733540
saving pstat: 0.250997

threads: 64
running: 10.086821
stopping yappi: 0.000015
getting stats: 2.568503
saving pstat: 0.418191

threads: 128
running: 10.611081
stopping yappi: 0.000020
getting stats: 19.279532
saving pstat: 1.528050

threads: 256
running: 11.660182
stopping yappi: 0.000034
getting stats: 89.631397
saving pstat: 6.518440

(too slow to wait for 512 threads)

With version 366:817bc5a923f0 we get: 

threads: 32
running: 10.077610
stopping yappi: 0.000013
getting stats: 0.002837
saving pstat: 0.222945

threads: 64
running: 10.868986
stopping yappi: 0.000014
getting stats: 0.002588
saving pstat: 0.884202

threads: 128
running: 10.069395
stopping yappi: 0.000021
getting stats: 0.005049
saving pstat: 0.618781

threads: 256
running: 11.143105
stopping yappi: 0.000042
getting stats: 1.759236
saving pstat: 3.865503

threads: 512
running: 11.195302
stopping yappi: 0.000055
getting stats: 0.002701
saving pstat: 9.521780

Summary: yappi.get_func_stats show O(N^2) behavior since 367:4038cc752506, 
while the previous commit (366:817bc5a923f0) show constant behavior.

Overloading Python with cpu bound workload make the behavior much worse.

Original issue reported on code.google.com by nir...@gmail.com on 7 Jan 2015 at 2:30

Attachments:

stress.py

GoogleCodeExporter commented 9 years ago

Profiling yappi.get_func_stats reveal this:

v367:

Wed Jan  7 19:44:57 2015    v367/yappi-512.pstat

         540077 function calls in 7.762 seconds

   Ordered by: internal time
   List reduced from 35 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   151358    3.377    0.000    3.426    0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
     7984    1.318    0.000    4.041    0.001 /home/nsoffer/src/yappi/yappi.py:246(__init__)
    13921    0.903    0.000    0.905    0.000 /home/nsoffer/src/yappi/yappi.py:488(__getitem__)
     7928    0.631    0.000    1.384    0.000 /home/nsoffer/src/yappi/yappi.py:269(__add__)
        1    0.364    0.364    7.761    7.761 /home/nsoffer/src/yappi/yappi.py:508(get)
     7956    0.305    0.000    6.464    0.001 /home/nsoffer/src/yappi/yappi.py:539(_enumerator)
     7956    0.257    0.000    0.257    0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
     7983    0.253    0.000    1.709    0.000 /home/nsoffer/src/yappi/yappi.py:401(append)
     7956    0.133    0.000    0.137    0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
    36049    0.072    0.000    0.072    0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
   103736    0.067    0.000    2.723    0.000 {setattr}
   151358    0.048    0.000    0.048    0.000 {method 'index' of 'tuple' objects}
        1    0.025    0.025    6.490    6.490 {_yappi.enum_func_stats}
     7956    0.004    0.000    0.004    0.000 {method 'rfind' of 'str' objects}
    13922    0.002    0.000    0.002    0.000 {isinstance}
    13929    0.002    0.000    0.002    0.000 {function append at 0x1940848}
       28    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:377(__init__)
        2    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:693(sort)
        2    0.000    0.000    0.000    0.000 {function sort at 0x19411b8}
       27    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:385(<lambda>)

v366:

Wed Jan  7 19:38:01 2015    v366/yappi-512.pstat

         2910 function calls in 0.067 seconds

   Ordered by: internal time
   List reduced from 34 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      709    0.065    0.000    0.065    0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
      655    0.000    0.000    0.065    0.000 {setattr}
       57    0.000    0.000    0.066    0.001 /home/nsoffer/src/yappi/yappi.py:246(__init__)
      709    0.000    0.000    0.000    0.000 {method 'index' of 'tuple' objects}
       56    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:398(append)
        1    0.000    0.000    0.067    0.067 /home/nsoffer/src/yappi/yappi.py:506(get)
       28    0.000    0.000    0.002    0.000 /home/nsoffer/src/yappi/yappi.py:534(_enumerator)
      374    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
       48    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:486(__getitem__)
        1    0.000    0.000    0.002    0.002 {_yappi.enum_func_stats}
       28    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
       28    0.000    0.000    0.000    0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
       28    0.000    0.000    0.000    0.000 {method 'rfind' of 'str' objects}
       28    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:374(__init__)
        2    0.000    0.000    0.000    0.000 {function sort at 0x27562a8}
       56    0.000    0.000    0.000    0.000 {function append at 0x2755938}
        2    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:682(sort)
        2    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:381(sort)
       27    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:382(<lambda>)
       49    0.000    0.000    0.000    0.000 {isinstance}

Note 7956 _enumerator calls vs 28.

Attached stress script profiling yappi.get_func_stats

Original comment by nir...@gmail.com on 7 Jan 2015 at 5:52

Attachments:

stress.py

GoogleCodeExporter commented 9 years ago

I found an error in the stress script - I forgot to clear the stats between the 
test runs with different thread count.

These are the results with the fixed script.

v367:

threads: 32
running: 10.012666
stopping yappi: 0.000013
getting stats: 0.140000
saving pstat: 0.012483

threads: 64
running: 10.030343
stopping yappi: 0.000013
getting stats: 0.064788
saving pstat: 0.027382

threads: 128
running: 10.036739
stopping yappi: 0.000019
getting stats: 0.270757
saving pstat: 0.114645

threads: 256
running: 10.078558
stopping yappi: 0.000036
getting stats: 1.449708
saving pstat: 0.086878

threads: 512
running: 10.203625
stopping yappi: 0.000051
getting stats: 6.936577
saving pstat: 0.116690

v366:

threads: 32
running: 10.015445
stopping yappi: 0.000014
getting stats: 0.003525
saving pstat: 0.009566

threads: 64
running: 10.033361
stopping yappi: 0.000013
getting stats: 0.003743
saving pstat: 0.007299

threads: 128
running: 10.044901
stopping yappi: 0.000032
getting stats: 0.003609
saving pstat: 0.028504

threads: 256
running: 10.184238
stopping yappi: 0.000032
getting stats: 0.132134
saving pstat: 0.126637

threads: 512
running: 10.195147
stopping yappi: 0.000050
getting stats: 0.003170
saving pstat: 0.118647

And updated profiles of yappi.get_func_stats:

v367:

Wed Jan  7 20:21:26 2015    v367/yappi-512.pstat

         279917 function calls in 6.937 seconds

   Ordered by: internal time
   List reduced from 35 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    78398    2.656    0.000    2.682    0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
     7201    1.789    0.000    1.789    0.000 /home/nsoffer/src/yappi/yappi.py:488(__getitem__)
     4144    0.692    0.000    2.923    0.001 /home/nsoffer/src/yappi/yappi.py:246(__init__)
     4116    0.665    0.000    5.069    0.001 /home/nsoffer/src/yappi/yappi.py:539(_enumerator)
     4088    0.608    0.000    0.969    0.000 /home/nsoffer/src/yappi/yappi.py:269(__add__)
     4143    0.186    0.000    1.342    0.000 /home/nsoffer/src/yappi/yappi.py:401(append)
    18769    0.186    0.000    0.186    0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
        1    0.063    0.063    6.937    6.937 /home/nsoffer/src/yappi/yappi.py:508(get)
    53816    0.035    0.000    2.231    0.000 {setattr}
    78398    0.026    0.000    0.026    0.000 {method 'index' of 'tuple' objects}
        1    0.013    0.013    5.082    5.082 {_yappi.enum_func_stats}
     4116    0.009    0.000    0.009    0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
     4116    0.003    0.000    0.005    0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
     4116    0.002    0.000    0.002    0.000 {method 'rfind' of 'str' objects}
     7209    0.001    0.000    0.001    0.000 {function append at 0x7f77a2fcd848}
     7202    0.001    0.000    0.001    0.000 {isinstance}
       28    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:377(__init__)
        2    0.000    0.000    0.000    0.000 {function sort at 0x7f77a2fce1b8}
        2    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:384(sort)
       27    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:385(<lambda>)

v366:

Wed Jan  7 20:23:17 2015    v366/yappi-512.pstat

         2871 function calls in 0.003 seconds

   Ordered by: internal time
   List reduced from 34 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      698    0.001    0.000    0.001    0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
      644    0.000    0.000    0.002    0.000 {setattr}
       56    0.000    0.000    0.002    0.000 /home/nsoffer/src/yappi/yappi.py:246(__init__)
      698    0.000    0.000    0.000    0.000 {method 'index' of 'tuple' objects}
       55    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:398(append)
        1    0.000    0.000    0.003    0.003 /home/nsoffer/src/yappi/yappi.py:506(get)
       28    0.000    0.000    0.002    0.000 /home/nsoffer/src/yappi/yappi.py:534(_enumerator)
      373    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
       47    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:486(__getitem__)
        1    0.000    0.000    0.002    0.002 {_yappi.enum_func_stats}
       28    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
       28    0.000    0.000    0.000    0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
       28    0.000    0.000    0.000    0.000 {method 'rfind' of 'str' objects}
       28    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:374(__init__)
        2    0.000    0.000    0.000    0.000 {function sort at 0x11882a8}
       55    0.000    0.000    0.000    0.000 {function append at 0x1183938}
        2    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:682(sort)
        2    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:381(sort)
       27    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:382(<lambda>)
        1    0.000    0.000    0.000    0.000 /home/nsoffer/src/yappi/yappi.py:639(add)

Note 4116 calls to _enumerator vs 28.

Original comment by nir...@gmail.com on 7 Jan 2015 at 6:32

Attachments:

stress.py

GoogleCodeExporter commented 9 years ago

Looking at the code, I think I understand the issue. Before version 367, we 
used to store one _pit item per function, and recorded the function stats and 
call level in that pit. Different threads calling same function were recorded 
in the same pit item.

In version 367, we store one pit item per function per thread calling this 
function, so we can get statistics per function and thread. When using many 
threads, we have many more pit items, and enumerating them is much more 
expensive. Also, we use much more memory for profiling, limiting the time we 
can profile on a system with limited memory, or effecting the performance, 
swapping real application pages from memory.

This design is particularly bad for vdsm, because not only we have 100's of 
threads, we also create new thread for each request when using jsonrpc 
transport. So profiling using new yappi version keeps pit item for each 
temporary short living thread.

Since this feature is not needed by all users and have severe performance 
regression, I think yappi should enable it only on demand. By default, pit 
should be stored in global table. If this feature is enabled, pit will be 
stored per context.

Original comment by nir...@gmail.com on 7 Jan 2015 at 7:14

GoogleCodeExporter commented 9 years ago

Hi Nir,

Fabio Zadrozny had done some optimizations regarding post-processing time on 
profiler output. See this commit: d09a8a3. Can you please also test with latest 
code? get_func_stats() calls YStats.append() which also has O(n) behaviour, 
with this patch we will have O(1) behaviour. 

So, can you please test with the latest version and see if that works ok? After 
that, if the performance is still not enough, we can discuss options, however 
per-thread stat filtering is a great feature and I personally do not like to 
move it out...

Original comment by sum...@gmail.com on 9 Jan 2015 at 7:47

GoogleCodeExporter commented 9 years ago

I tested again including commit 371 (Fabio optimization), and I don't see
significant difference in timing.

The main issue is having much more pit items, translating to slow enumeration
code in pure Pyhton. I'm not suggesting to remove this feature, only to make
it optional.

Note that in this test I changed the stress script to use sha512 instead
of json encoding/decoding, since this simulates better the type of load
we see in overloaded vdsm process.

v366
====

threads: 32
running: 10.014859
stopping yappi: 0.000015
getting stats: 0.003952
saving pstat: 0.001021

threads: 64
running: 10.018209
stopping yappi: 0.000016
getting stats: 0.009164
saving pstat: 0.002347

threads: 128
running: 10.025956
stopping yappi: 0.000023
getting stats: 0.088673
saving pstat: 0.007464

threads: 256
running: 10.038204
stopping yappi: 0.000031
getting stats: 0.130150
saving pstat: 0.067876

threads: 512
running: 10.068508
stopping yappi: 0.000055
getting stats: 0.132270
saving pstat: 0.052638

Fri Jan  9 10:34:22 2015    v366/yappi-512.pstat

         3636 function calls in 0.132 seconds

   Ordered by: internal time
   List reduced from 34 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      879    0.083    0.000    0.083    0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
       71    0.030    0.000    0.110    0.002 /home/nsoffer/src/yappi/yappi.py:246(__init__)
       37    0.009    0.000    0.009    0.000 /home/nsoffer/src/yappi/yappi.py:486(__getitem__)
       34    0.003    0.000    0.034    0.001 /home/nsoffer/src/yappi/yappi.py:534(_enumerator)
       69    0.003    0.000    0.004    0.000 /home/nsoffer/src/yappi/yappi.py:398(append)
        1    0.001    0.001    0.001    0.001 /home/nsoffer/src/yappi/yappi.py:639(add)
      815    0.001    0.000    0.080    0.000 {setattr}
      548    0.001    0.000    0.001    0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
      879    0.001    0.000    0.001    0.000 {method 'index' of 'tuple' objects}
        1    0.000    0.000    0.131    0.131 /home/nsoffer/src/yappi/yappi.py:506(get)

v367
====

threads: 32
running: 10.021096
stopping yappi: 0.000015
getting stats: 0.015900
saving pstat: 0.001181

threads: 64
running: 10.017866
stopping yappi: 0.000017
getting stats: 0.155157
saving pstat: 0.008110

threads: 128
running: 10.026521
stopping yappi: 0.000021
getting stats: 3.193556
saving pstat: 0.022553

threads: 256
running: 10.043622
stopping yappi: 0.000032
getting stats: 9.977705
saving pstat: 0.080130

threads: 512
running: 10.096404
stopping yappi: 0.000056
getting stats: 59.533383
saving pstat: 0.102317

Fri Jan  9 10:37:16 2015    v367/yappi-512.pstat

         165787 function calls in 59.533 seconds

   Ordered by: internal time
   List reduced from 35 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    49477   31.873    0.001   31.908    0.001 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
     2627   12.108    0.005   34.919    0.013 /home/nsoffer/src/yappi/yappi.py:246(__init__)
     2590    5.083    0.002   56.855    0.022 /home/nsoffer/src/yappi/yappi.py:539(_enumerator)
     2556    4.138    0.002   11.794    0.005 /home/nsoffer/src/yappi/yappi.py:269(__add__)
     2081    2.338    0.001    2.338    0.001 /home/nsoffer/src/yappi/yappi.py:488(__getitem__)
     8217    2.041    0.000    2.041    0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
     2590    0.687    0.000    0.691    0.000 /usr/lib64/python2.7/posixpath.py:119(basename)
     2590    0.545    0.000    0.545    0.000 /home/nsoffer/src/yappi/yappi.py:94(_func_fullname)
     2625    0.515    0.000   14.350    0.005 /home/nsoffer/src/yappi/yappi.py:401(append)
        1    0.096    0.096   59.533   59.533 /home/nsoffer/src/yappi/yappi.py:508(get)

v371
====

threads: 32
running: 10.018111
stopping yappi: 0.000016
getting stats: 0.031227
saving pstat: 0.021802

threads: 64
running: 10.017729
stopping yappi: 0.000020
getting stats: 0.047200
saving pstat: 0.012259

threads: 128
running: 10.026679
stopping yappi: 0.000023
getting stats: 2.801588
saving pstat: 0.038671

threads: 256
running: 10.045928
stopping yappi: 0.000032
getting stats: 6.875774
saving pstat: 0.052142

threads: 512
running: 10.078698
stopping yappi: 0.000166
getting stats: 50.336383
saving pstat: 0.116318

Fri Jan  9 10:32:07 2015    v371/yappi-512.pstat

         286465 function calls in 50.336 seconds

   Ordered by: internal time
   List reduced from 46 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    82181   26.909    0.000   26.949    0.000 /home/nsoffer/src/yappi/yappi.py:252(__setattr__)
     4671   11.754    0.003   31.389    0.007 /home/nsoffer/src/yappi/yappi.py:246(__init__)
     4669    2.402    0.001   14.454    0.003 /home/nsoffer/src/yappi/yappi.py:474(append)
     2044    2.186    0.001    6.086    0.003 /home/nsoffer/src/yappi/yappi.py:336(__add__)
     2590    1.465    0.001   20.931    0.008 /home/nsoffer/src/yappi/yappi.py:571(_enumerator)
        1    1.339    1.339   50.336   50.336 /home/nsoffer/src/yappi/yappi.py:540(get)
     2556    1.307    0.001    3.973    0.002 /home/nsoffer/src/yappi/yappi.py:272(__add__)
     4669    0.790    0.000   12.048    0.003 /home/nsoffer/src/yappi/yappi.py:428(append)
     4600    0.705    0.000    0.705    0.000 /home/nsoffer/src/yappi/yappi.py:264(__eq__)
     2081    0.652    0.000    0.654    0.000 /home/nsoffer/src/yappi/yappi.py:480(__getitem__)

Original comment by nir...@gmail.com on 9 Jan 2015 at 8:54

Attachments:

stress.py

GoogleCodeExporter commented 9 years ago

I onl;y find time today to look into this. After thinking about this a while 
and experimenting with the stress.py a bit more, I agree with you that 
applications that have too much short lived threads will be affected by this 
path severely. Things will become even more when the called functinos have many 
childre as those are also will be caculated per _pit.

Anyway, great investigation as usual!

I am currently thinking about following solution:

- Hold _pit's on both per-thread/global tables.
- When user calls yappi.get_func_stats(filter={"ctx_id": tstat.id}) we will 
know is user is interesterd in per-thread profile data. If so, let user 
enumerate on per-thread, if not then use global table. 

This will add a little bit more memory but I don'like the idea of forcing users 
to decide on that type of details? As ean example, I personally do not want 
this:

yappi.start(..., per_thread_info=True)

I think it is worth having an extra memory consumption in order to eliminate 
this kind of parameter.

Of course, needless to say I am open to your contradicting suggestions/ideas?

Thanks,

Original comment by sum...@gmail.com on 6 Feb 2015 at 5:44

GoogleCodeExporter commented 9 years ago

I think that the solution above is the best:

    yappi.start(..., per_thread_info=True)

When run in this mode, keep per-thread stats. Otherwise, keep global stats.

People who need this kind of info, can easily enable this mode, and other users
are not effected.

In this, mode, filtering by ctx_id should raise UnsupportedFilter error, hinting
about per_thread_info=True.

I'm also happy with having per_thread_info=True by default so we will disable
it. But I think the right way to introduce such feature is having it disabled
by default.

I did not measure the additional memory requirements of the per-thread-info. It
is possible that the cost is insignificant and disabling it is not worth the 
the effort.

Original comment by nir...@gmail.com on 8 Feb 2015 at 7:37

GoogleCodeExporter commented 9 years ago

I think filtering on ctx_id shall work on all occasions, we should not add a 
new parameter to start. Because: What will happen user first run 
start(per_thread_info=True) and then stop() and then run false(). We can 
probably make this work but we will have to deal with additional complexity 
without actually any benefit. If user wants to see per_thread_info, he/she 
should be able to get it, but of course enumerating those stats will be 
different. This approach seems simpler/cleaner to implement, IMHO.

Original comment by sum...@gmail.com on 8 Feb 2015 at 9:33

GoogleCodeExporter commented 9 years ago

If you want to support multiple runs (start...stop...start...stop), than 
profiling parameters should not be set in start() - it does not make any sense. 
For example:

    start(builtins=True)
    stop()
    start(builtins=False)
    ...
    stop()

To support this use pattern, we need a profiling session:

    profile = yappi.Profile(per_thread_info=True, builtins=True)
    profile.start()
    profile.pause()
    profile.resume()
    ...
    profile.stop()

This user interface is easy for the user and the implementor, the semantics are 
very clear.

Original comment by nir...@gmail.com on 9 Feb 2015 at 3:24

GoogleCodeExporter commented 9 years ago

Oh... 

Your suggestion will change existing API and break backward imcompatibility. 
And I still cannot see any motivation for this. I don't understand what is 
confusing retrieving per-thread stats by following:

yappi.get_func_stats(ctx_id=1)

and what is more clearer on:

yappi.Profile(per_thread_info=True...

And PS: merging multiple runs with builtins is somewhat ok, but merging 
different runs on different per-thread will not make sense and will add extra 
complexity for no reason.

I understand it might be implemented the way you suggested or there are 
thousand ways to implement the interface for yappi , but any real benefit?

Original comment by sum...@gmail.com on 25 Feb 2015 at 10:43

GoogleCodeExporter commented 9 years ago

BTW, yappi already support multiple runs by start()/stop() subsequently.

Original comment by sum...@gmail.com on 25 Feb 2015 at 10:44

GoogleCodeExporter commented 9 years ago

Current yappi api is not clear, and leads to conflicting setting that do
not make sense in the same profiling "session".

start() is used for both starting profiling and resuming it after stop()
stop() is used for both stopping profiling and pausing.

Python builtin profilers use enable() and disable(), which are better, because
people tend to expect different semantics from start() and stop().
It is clear that you can enable and disable a profiler multiple times, but 
people do not expect that starting and stopping a profiler will keep collecting
data for the same session.

This issue becomes worse by accepting profiling options in start(). As you 
found, it is possible to do several start() calls, each with conflicting 
settings or settings that do not make sense in the same profiling session.

For example, when you collect profile data in several runs, it is unlikely that
you want to have on run with builtins, and one run without. Or one run with
threads and the other run without threads.

The profiling session api does not have any of these issues. Settings are done
once when you create a profile session. You cannot change these settings until
you stop this session, so you cannot have conflicting options like with threads
and without threads.

This api does not break compatibility, since this is a new layer of 
control. You can still use the old functions if you like. It can be built
on top of the lower level yappi.start() and yappi.stop().

The only issue I can see is limiting multiple profiling sessions. It
does not make sense to support creating of two profiling sessions
in the same time.

Here is another api which does not have this issue, and avoid conflicting
session settings:

start(**options)  start a session with certain options, clearing previous
                  session collected data.
pause()           pause current session.
resume()          resume current session using settings set in start().
stop()            stop current session. resume() and pause() will fail
                  after a session was stopped.   

This of course breaks backward compatibility because start will clear
old data.

But lets focus on the topic of disabling per threads collection with the
current api - using an option in start() is good enough, if you keep using
the same options in the next start() calls, which is the only way that make
sense anyway.

Original comment by nir...@gmail.com on 27 Feb 2015 at 9:39

GoogleCodeExporter commented 9 years ago

I completely agree that .enable/.disable and having a different Profile object 
have a better API. It is far more cleaner/better design. But: generally when 
desiging an API, it seems better practice to have a single way to do something 
so I could not find a motivation to provide a second interface just because it 
is cleaner. But: of course, this does not mean backward compatibility is 
everything. We can break the things if we want. But honestly, I simply do not 
find the motivation to do that in our case because the gain is small. I would 
rather work on another feature such as "loading stats from PSTAT feature" for 
example. 

Coming to the topic of the topic of disabling per threads collection with the
current api: I would re-think your suggestion on putting that to start() but I 
cannot promise I would do that way, since both have trade-offs. Let's see.

Thanks for all your comments.

Original comment by sum...@gmail.com on 28 Feb 2015 at 9:52

leepro / yappi

Severe performance regression in yappi.get_func_stats since 367:4038cc752506 #54