vmprof / vmprof-python

vmprof - a statistical program profiler
http://vmprof.com
Other
433 stars 54 forks source link

Add a 'flat' profile mode #228

Closed inducer closed 3 years ago

inducer commented 3 years ago

While the default display of vmprof is very successful at finding where code spends its time viewed from the root of the call tree, it does not work well to find the worst offenders when viewed from the leaves, i.e. functions that spend a lot of time in aggregate, by being called very often. This PR refactors vmprofshow to use ArgumentParser subparsers for its command line interface, and makes the existing lines mode use that. It also adds a flat mode that presents flat statistics of time used, a bit more like pstats from cProfile. Here's some example output:

     5.585 - py:__init__:431:/shared/home/andreask_work/pack/emirge/pyopencl/pyopencl/array.py
     5.230 - py:enqueue_knl_axpbyz:4:<generated code>
     5.098 - py:kernel_runner:148:/shared/home/andreask_work/pack/emirge/pyopencl/pyopencl/array.py
     4.566 - py:_lpy_host_resample_by_picking:8:<generated code>
     4.433 - py:wrapper:675:/home/andreask_work/shared/pack/emirge/miniforge3/envs/ceesd/lib/python3.8/site-packages/pytools/__init__.py
     4.388 - py:enqueue_knl_axpb:4:<generated code>
     4.034 - py:call_loopy:483:/shared/home/andreask_work/pack/emirge/meshmode/meshmode/array_context.py
     3.812 - py:get_common_dtype:79:/shared/home/andreask_work/pack/emirge/pyopencl/pyopencl/compyte/array.py
     3.812 - py:enqueue_knl_multiply:4:<generated code>
     2.128 - py:_lpy_host_diff:8:<generated code>
     2.039 - py:product:1055:/home/andreask_work/shared/pack/emirge/miniforge3/envs/ceesd/lib/python3.8/site-packages/pytools/__init__.py
     1.773 - py:_bop:242:/shared/home/andreask_work/pack/emirge/meshmode/meshmode/dof_array.py
     1.729 - py:invoke_resample_by_picking_loopy_kernel:118:<generated code>

A legitimate question would be, why add that when cProfile already has that. I'll simply add the (analogous) cProfile output fo the same code:

    65500    1.328    0.000    1.328    0.000 {built-in method pyopencl._cl.enqueue_nd_range_kernel}
    40300    0.620    0.000    3.918    0.000 /shared/home/andreask_work/pack/emirge/pyopencl/pyopencl/array.py:148(kernel_runner)
    60300    0.569    0.000    0.815    0.000 /shared/home/andreask_work/pack/emirge/pyopencl/pyopencl/array.py:431(__init__)
    25200    0.388    0.000    2.603    0.000 /shared/home/andreask_work/pack/emirge/meshmode/meshmode/array_context.py:483(call_loopy)
321000/295800    0.351    0.000    1.695    0.000 /home/andreask_work/shared/pack/emirge/miniforge3/envs/ceesd/lib/python3.8/site-packages/pytools/__init__.py:675(wrapper)
    39900    0.241    0.000    0.400    0.000 /shared/home/andreask_work/pack/emirge/pyopencl/pyopencl/compyte/array.py:79(get_common_dtype)
  1189900    0.236    0.000    0.270    0.000 {built-in method builtins.isinstance}
   783800    0.208    0.000    0.208    0.000 {built-in method builtins.getattr}
    12000    0.181    0.000    0.469    0.000 <generated code>:8(_lpy_host_resample_by_picking)
    77600    0.180    0.000    0.180    0.000 {built-in method numpy.zeros}
    13700    0.173    0.000    0.500    0.000 <generated code>:4(enqueue_knl_axpbyz)

Note that the two sets of output are nothing alike... and at least for now I'm inclined to trust vmprof more. :)

At any rate, thanks for vmprof, and maybe you'll find this useful.

cc @matthiasdiener

mattip commented 3 years ago

vmprofshow is documented in the README.md page. Could you update that with an example of using the new options?

inducer commented 3 years ago

vmprofshow is documented in the README.md page. Could you update that with an example of using the new options?

9789b49. See https://github.com/inducer/vmprof-python/tree/add-vmprof-show-flat for a rendered view.

I should note that this changes the interface of vmprofshow a bit, as described in the updated documentation, because of the use of subparsers for the different modes. I think that makes sense, because it prevents ineffective options from being provided. One downside is that the user must now specify a mode. (I.e. specifying tree or one of the other modes is now mandatory.)

inducer commented 3 years ago

Thanks for taking a look! All your suggestions LGTM.

mattip commented 3 years ago

Thanks @inducer