sumerc / yappi

Yet Another Python Profiler, but this time multithreading, asyncio and gevent aware.
MIT License
1.45k stars 73 forks source link

Feature request: Filter stats by package #35

Closed MatthewWilkes closed 4 years ago

MatthewWilkes commented 4 years ago

Hi there,

It would be lovely to filter stats by package. Because you're using PyObject_RichCompareBool as part of the filter it's already possible to achieve this with a helper object. Would you be interested in integrating this as a feature with a stable API?

My current implementation is:

import dataclasses
import importlib
import os

@dataclasses.dataclass
class PackageModule:
    package: str

    def __post_init__(self):
        mod = importlib.import_module(self.package)
        self.fn = mod.__file__
        if self.fn.endswith("__init__.py"):
            self.fn = os.path.dirname(self.fn)

    def __eq__(self, other):
        return other.startswith(self.fn)

yappi.get_func_stats(filter={"modname": PackageModule("apd.aggregation")}).print_all()

There are caveats to this, mainly that it requires the module to be importable, and not undesirable to import (such as potential import-time side-effects), but I think it improves the usability a fair bit.

What do you think? If you're interested I'm happy to put together a PR.

Matt

sumerc commented 4 years ago

Hi,

Well. I know for sure that the current filtering implementation is too simple and does not cover too many cases. The reason behind this is: when I do not see the use cases for a specific feature entirely, I do a minimum implementation and let some time to decide.

In that sense: I am all into a better implementation of filtering API(like something in Django queries?) but your current cases seemed a bit too specific to modname. So, please correct me if I am wrong but if we somehow have a regex support in modname, your problem is already solved, or maybe I am missing something?

Examples:

yappi.get_func_stats(modname_istartswith='django.db').print_all()
yappi.get_func_stats(modname_icontains='django.db').print_all()
MatthewWilkes commented 4 years ago

Well, modname, at least for me, is the full path to the .py file. I'm testing this under Windows (I'm writing about Python profiling and using Windows to force myself not to write POSIX-specific things) and I see filenames. If I could do package/module identifier like django.db that'd be a great step up, and starts with would be sufficient.

sumerc commented 4 years ago

Nice to know.

Ok. After thinking through this again and again, it turns out that it would be better to add new functionality rather than modifying the current behavior as it probably will break code.

What I am thinking very roughly is something like a filter_func param which will be called per stats and we will simply filter based on that.

Pros:

Cons:

Example:

yappi.get_func_stats(filter={"name"}, filter_callback=my_filter_callback)
sumerc commented 4 years ago

Closing this as the same behavior can be accomplished by following code(instead of filter_callback). I could not see any benefit in having another filter param when we can get a YFuncStat object and apply filtering on its properties.

stats = yappi.get_func_stats()
for stat in stats:
    if stat.module == PackageModule("apd.aggregation"):
        # do something

Here is a more detailed answer.

sumerc commented 4 years ago

We have a new API param for this filter_callback in get_func_stats().

Here is an example from the docs:

import package_a
import yappi
import sys

def a():
    pass

def b():
    pass

yappi.start()
a()
b()
package_a.a()
yappi.stop()

# filter by module object
current_module = sys.modules[__name__]
stats = yappi.get_func_stats(
    filter_callback=lambda x: yappi.module_matches(x, [current_module])
)  # x is a yappi.YFuncStat object
stats.sort("name", "desc").print_all()
'''
Clock type: CPU
Ordered by: name, desc

name                                  ncall  tsub      ttot      tavg
doc2.py:10 b                          1      0.000001  0.000001  0.000001
doc2.py:6 a                           1      0.000001  0.000001  0.000001
'''

# filter by function object
stats = yappi.get_func_stats(
    filter_callback=lambda x: yappi.func_matches(x, [a, b])
).print_all()
'''
name                                  ncall  tsub      ttot      tavg
doc2.py:6 a                           1      0.000001  0.000001  0.000001
doc2.py:10 b                          1      0.000001  0.000001  0.000001
'''

# filter by module name
stats = yappi.get_func_stats(filter_callback=lambda x: 'package_a' in x.module
                             ).print_all()
'''
name                                  ncall  tsub      ttot      tavg
package_a/__init__.py:1 a             1      0.000001  0.000001  0.000001
'''

# filter by function name
stats = yappi.get_func_stats(filter_callback=lambda x: 'a' in x.name
                             ).print_all()
'''
name                                  ncall  tsub      ttot      tavg
doc2.py:6 a                           1      0.000001  0.000001  0.000001
package_a/__init__.py:1 a             1      0.000001  0.000001  0.000001
'''
MatthewWilkes commented 4 years ago

Thanks, I'll update the example code and docs I wrote.