sumerc / yappi

Yet Another Python Profiler, but this time multithreading, asyncio and gevent aware.
MIT License
1.44k stars 72 forks source link

Feature request: strip prefix from filenames #86

Open robin-wayve opened 3 years ago

robin-wayve commented 3 years ago

I tried using yappi on some code that is executed through Bazel and it was difficult to read the stats because the paths all begin with an identical, deep prefix. Something like this:

/home/user/.cache/bazel/_bazel_user/a1901621c7221d386cf4d8fa1b80a433/sandbox/sandboxfs/5/code/bazel-out/k8-opt/bin/path/in/project/target.runfiles/<the bit I actually care about starts here>

A --prefix or --strip-prefix option that would allow me to drop the predictable and repeated parts of file paths would be a help to be able to focus on the more interesting part at the end.

sumerc commented 3 years ago

Good idea!

Yappi is already showing the rightmost part of the string but I assume this is not enough for your case, right?

One idea might be to strip off part of the string that contains a path in sys.path. An example:

/home/XXXX/.pyenv/versions/3.9.6-debug/lib/python3.9/site-packages/mock/__init__.py

becomes

mock/__init__.py

because /home/XXXX/.pyenv/versions/3.9.6-debug/lib/python3.9/site-packages/ is in sys.path. WDYT, does it solve then noise problem on your end?

robin-wayve commented 3 years ago

I probably should have mentioned: the context in which I found this difficult was feeding callgrind results to gprof2dot and then visualizing with graphviz (which just shows the entire path). I worked around this by manually editing my resulting dot file to strip the prefix.

I think stripping sys.path would not be desirable: check out the resulting visualisation I got in https://github.com/PyCQA/astroid/issues/1115 -- it's useful to see which calls were inside Python vs libraries.

sumerc commented 3 years ago

I see.

I agree that a bit of flexibility to filter some prefixes on the output would be useful. Will implement this in the next version! I still think --omit-sys-path might be a good improvement for some cases as well which will not be on by default(of course). Maybe I will implement both.

Again: Thanks for the idea!

robin-wayve commented 3 years ago

Yep, makes sense. Thanks for an awesome tool!

zanieb commented 2 years ago

I've got a rough patch for this fwiw

YappiYFuncStat = yappi.YFuncStat

class YFuncStat(YappiYFuncStat):
    def __setattr__(self, name, value):
        if isinstance(value, str) and value.startswith("/"):
            value = trim_sys_paths(value)
        super().__setattr__(name, value)

yappi.YFuncStat = YFuncStat

def trim_sys_paths(path_string):
    trim_length = 0
    for path in sys.path:
        if path_string.startswith(path) and len(path) > trim_length:
            trim_length = len(path)

    return path_string[trim_length:].lstrip("/")
sumerc commented 2 years ago

Thanks @madkinsz, I would happily merge this as a PR if you can:

zanieb commented 2 years ago

I'd be happy to contribute, but the methods here for detecting a path and trimming values seems sloppy. I imagine you wouldn't want this implemented on __setattr__ as well.