vpelletier / pprofile

Line-granularity, thread-aware deterministic and statistic pure-python profiler
GNU General Public License v2.0
447 stars 28 forks source link

Limit output to files in folder #3

Closed jan-glx closed 8 years ago

jan-glx commented 9 years ago

I'd love to have the ability to limit the output of pprofile to the files of my code(I don't mind where libraries spend their time). From a short look at your code it seems to be almost implemented.

vpelletier commented 9 years ago

Good idea, thanks.

Here are my current thoughts for command line use (are you using pprofile as a module ?):

--exclude glob_pattern [--exclude glob_pattern [...]]
  Exclude files whose path starts with any pattern.
--include glob_pattern [--include glob_pattern [...]]
  Include files whose path starts with any pattern and would have been otherwise excluded.

Ie, by default everything is included, then one will exclude specific paths when found in profiling result, then maybe re-include paths (which can be useful when a few system-wide packages need profiling, but the majority does not).

Does it look usable ?

jan-glx commented 9 years ago

Sounds great! Maybe you can exclude everything first if only --include is given. It would also be very handy if it was possible to specify relative paths (because you said starts with - - just get the absolute path for all arguments first) Additionally for the probably most common use case you could add a --exclude_pythonpath switch to exclude everything in the sys.path folders.

vpelletier commented 9 years ago

All good points, thanks !

Thinking more about it, there is one complication to glob patterns: I do not want to rely on the actual file tree (ex: match files inside an egg, or in-ZODB) so I cannot use on glob module verbatim. I cannot just use fnmatch either, although it has the API I need, as "/a*/c" matching "/a/b/c" would be surprising.

I'm also somewhat reluctant about regexes, as their usage in a (very likely) filename context would be surprising.

vpelletier commented 8 years ago

@jan-glx Hey, I finally got around to implement this feature. In the end, I chose regex as the exclusion/inclusion syntax, because all fnmatch schemes I could think of would have hard to understand effects, or would be hard to write (requiring silly things like --exclude /* --exclude /*/* etc) and be likely hard to write outside of *nix paths.

Please give current master a try, check the new option group and tell me what you think.

jan-glx commented 8 years ago

Cool! Regex sounds like a powerful solution; would I use it like pproflile myscrip.py --include '/^C:\\\\path\\to\\my project\\.*/', or how? I just tested the --exclude_syspath option, it works nicely with the deterministic profiler. But when I use statistical profiling by specifying -s 0.1, I get the following error:

Traceback (most recent call last):
  File "C:\Anaconda3\envs\py27\Scripts\pprofile-script.py", line 9, in <module>
    load_entry_point('pprofile==1.8.1', 'console_scripts', 'pprofile')()
  File "C:\Anaconda3\envs\py27\lib\site-packages\pprofile.py", line 952, in main
    x for x in prof.getFilenameSet()
AttributeError: 'StatisticalThread' object has no attribute 'getFilenameSet'
vpelletier commented 8 years ago

would I use it like pproflile myscrip.py --include '/^C:\path\to\my project.*/', or how?

There is no need for leading & trailing slashes as the code already expects regexes. Something like this should work:

pproflile myscrip.py --include '^C:\\\\path\\to\\my project\\.*'

But when I use statistical profiling by specifying -s 0.1, I get the following error:

Ouch, nice catch, thanks. Fixed in master & added in automated tests.

vpelletier commented 8 years ago

Two more notes about exclude/include:

1) Regexes really apply to whatever python thinks is the source file name. So if "myscript.py" is somewhere in the "c:\path\to\my project\" subtree, above regex will exclude samples from "myscript.py": python considers it does not know the path of the excuted file:

$ cat printfile.py
print '__file__=', repr(__file__)
$ cat importprintfile.py
import printfile
$ python printfile.py 
__file__= 'printfile.py'
$ python importprintfile.py 
__file__= '/tmp/printfile.py'

2) --exclude-syspath only excludes the sys.path as it is while still executing pprofile.py itself. So if profiled script is part of an installed egg (for example), not much will be actually profiled. I'll have to refine this option further before I can release pprofile with it.

vpelletier commented 8 years ago

Released in pprofile 1.9 .