Clarify how to run the statistical profiler over just a portion of the code?

anntzer commented 8 years ago

The README gives

import pprofile

def someHotSpotCallable():
    profiler = pprofile.Profile()
    with profiler:
        # Some hot-spot code
    profiler.print_stats()

as an example. However, it seems(?) not possible to directly swap in a StatisticalProfile, as its docstring says "This class does not gather its own samples by itself. Instead, it must be provided with call stacks (as returned by sys._getframe() or sys._current_frames())." I assume the context manager doesn't take care of setting up a second thread and so on? Additional pointers in the doc would be welcome, thanks in advance.

vpelletier commented 8 years ago

Hello.

I assume the context manager doesn't take care of setting up a second thread and so on?

Correct, there are two "personalities" here:

how to collect samples
how to browse collected data The latter is common to deterministic and statistic profilers, but the former is where they differ. As I implemented deterministic first and did not identify this early on, it mixes both personalities - which makes it easier to use, but makes also statistic variant more confusing indeed.

An advantage of this split beyond code sharing is that a single data browser can be fed by multiple statistic gatherers (ex, one per worker thread).

Does 035c0606973b9e97edae453258200cd5727d4561 help ?

anntzer commented 8 years ago

Thanks. I assume calling profiler.dump_stats("callgrind.out.foo") will dump the stats in callgrind format? How does the remark "Generated files will use relative paths, so you can extract generated archive in the same path as profiling result, and kcachegrind will load them - and not your system-wide files, which may differ." apply there?

vpelletier commented 8 years ago

You will need to use profiler.callgrind(open("callgrind.out.foo", "w")) to get callgrind format. dump_stat and several other methods are mostly to provide the same API as cProfile, although they return the pprofile annotation format (calling annotate internally).

How does the remark "Generated files will use relative paths, so you can extract generated archive in the same path as profiling result, and kcachegrind will load them - and not your system-wide files, which may differ." apply there?

This applies when using pprofile as a command, as it can generate a zip file with the whole source code involved in the profiling result, allowing to examine trace result on another machine without having to worry about source code versions/edition. Paths both in callgrind result and in zip file are made relative so each result is self-contained (it does not try to read or write files at the original execution path).

The code generating zip file is not directly accessible when using pprofile as a module as it will lilkely not do the right thing, but methods are present to implement similar feature. See zpprofile.py for an example of how I used this to generate a different archive format (mime/multipart, because other multi-file formats would try to write to local filesystem) when using pprofile from inside Zope, allowing the source to be downloaded along with profiling result.

anntzer commented 8 years ago

Thanks for the clarification. While I can see the advantages in separating the pprofile and callgrind writers, I think that putting them together would probably be better from a usability point of view. Or at least, make sure that the docstring of .dump_stats points clearly to .callgrind: I basically first noticed .dump_stats in the output of pydoc, thought that (similarly to the command line case) it would pick an output type based on the filename, and was very confused when this did not happen.

vpelletier commented 7 years ago

In the freshly-released 1.10.0 statistic profiler becomes "less different" from deterministic one, which should make usage less confusing.

The original way (instantiating profiler and thread separately) may still be useful when one wishes to use non-default thread settings along with ProfileRunnerBase methods.

About output format, I think it is indeed nice to automatically switch format based on the name given to dump_stats. I pushed to master, sadly right after releasing 1.10.0 .

anntzer commented 7 years ago

Thanks!

vpelletier / pprofile

Clarify how to run the statistical profiler over just a portion of the code? #12