sumerc / yappi

Yet Another Python Profiler, but this time multithreading, asyncio and gevent aware.
MIT License
1.45k stars 73 forks source link

Documentation: Clarify filter function behaviors #43

Closed vEpiphyte closed 4 years ago

vEpiphyte commented 4 years ago

Filters appear to behave as negative filters, removing data from the output. Is this correct/intended?

This is a test harness I was using to determine filter behavior (called genr.py):

import time
import random
import asyncio

import yappi

basename = 'genr'
debug = ''
if __debug__:
    debug = '_debug'
statsname = f'./{basename}{debug}.pstats'

N = 4
SLEEP = 0.5

times = [random.random() for i in range(N)]

async def genr():
    for t in times:
        await asyncio.sleep(t)
        yield t

async def sleeploop(i):
    time.sleep(i)
    # await asyncio.sleep(i)

async def run():
    values = []

    async for t in genr():
        values.append(t)

    await sleeploop(SLEEP)

    return values

async def main():
    print('start')

    yappi.set_clock_type('wall')
    tick = time.time()
    yappi.clear_stats()
    yappi.start(builtins=False)

    values = await run()

    yappi.stop()
    tock = time.time()
    filter = {}
    ##########################
    # DEFINE FILTER HERE 
    # filter['module'] = 'genr.py'
    ##########################
    stats = yappi.get_func_stats(filter=filter)  # type: yappi.YFuncStats
    for stat in stats:
        print("%s %s %0.6f" % (stat.name, stat.module, stat.ttot))
    # stats.print_all()
    # stas.print_all()
    # obj = yappi.convert2pstats(stats)  # type: profile.Stats
    # obj.dump_stats(statsname)

    print(f'took: {tock - tick}')

    assert values == times
    print(times)
    print(f'Total random time: {sum(times)}')
    print(f'Total run time: {sum(times) + SLEEP}')
    print('fin')

if __name__ == '__main__':
    asyncio.run(main())

If this is intended behavior, would you accept a PR updating the filter docs to mention this ?

vEpiphyte commented 4 years ago

Actually, I'm not even certain if that example is setting the module correctly at all.

sumerc commented 4 years ago

Ok. Let me clarify a bit.

I think I can see where the confusion is coming from: the filter param is actually optional.

As an example: you can always do following:

def foo():
    pass

def bar():
    pass

import yappi
yappi.start()
foo()
bar()
stats = yappi.get_func_stats()
for t in stats:
    print(t)
'''
{0: 'bar', 1: 'x.py', 2: 5, 3: 1, 4: 1, 5: False, 6: 3e-06, 7: 3e-06, 8: 1, 9: <yappi.YChildFuncStats object at 0x10d67d1d0>, 10: 0, 11: '_MainThread', 12: 0, 13: 3e-06, 14: 'x.py:5 bar'}
{0: 'foo', 1: 'x.py', 2: 1, 3: 1, 4: 1, 5: False, 6: 3e-06, 7: 3e-06, 8: 0, 9: <yappi.YChildFuncStats object at 0x10d67d210>, 10: 0, 11: '_MainThread', 12: 0, 13: 3e-06, 14: 'x.py:1 foo'}
'''

Here you can see the element keyed with 1 is the module's name which is: x.py. get_func_stats return YFuncStats object which contains above dict in a more user friendly way. See here.

Let's see:

for t in stats:
    print(t.__dict__)
'''
{'name': 'foo', 'module': 'x.py', 'lineno': 1, 'ncall': 1, 'nactualcall': 1, 'builtin': False, 'ttot': 3e-06, 'tsub': 3e-06, 'index': 0, 'children': <yappi.YChildFuncStats object at 0x105a181d0>, 'ctx_id': 0, 'ctx_name': '_MainThread', 'tag': 0, 'tavg': 3e-06, 'full_name': 'x.py:1 foo'}
{'name': 'bar', 'module': 'x.py', 'lineno': 5, 'ncall': 1, 'nactualcall': 1, 'builtin': False, 'ttot': 2e-06, 'tsub': 2e-06, 'index': 1, 'children': <yappi.YChildFuncStats object at 0x105a18190>, 'ctx_id': 0, 'ctx_name': '_MainThread', 'tag': 0, 'tavg': 2e-06, 'full_name': 'x.py:5 bar'}
'''

Now, it is better right?

With this kind of introspection, I think any kind of filtering is possible easily? WDYT?

The reason we have filter param in get_func_stats is that: it gives very fast filtering on C side for some params(especially for tag and ctx_id) and I completely agree it causes confusion. (I do not want to play with it at least right now for backward compat.)

While discussing it, I feel that this also needs to be documented. I said docs are poor, right? :)

Coming to your original question again:

Filters appear to behave as negative filters, removing data from the output.

I am not sure if I understand you correctly but say you filter on module=django, that means that all the output that does not contain django as a module name will be skipped.

vEpiphyte commented 4 years ago

@sumerc This is definitely helpful. I'll play around with this more.

filter on module=django, that means that all the output that does not contain django as a module name will be skipped.

That is correct by what I meant as a "negative" filter.

sumerc commented 4 years ago

We have changed this whole filtering with the branch improve-filtering. All docs updated. Please see and comment if you see any more improvement on that.