sumerc / yappi

Yet Another Python Profiler, but this time multithreading, asyncio and gevent aware.
MIT License
1.41k stars 73 forks source link

base class names incorrect #125

Open andre-merzky opened 1 year ago

andre-merzky commented 1 year ago

Dear Yappies, thanks for this fantastic tool - I wish I would have found it earlier :-)

This ticket is about an issue with names derived for bass classes in a simple class hierarchy. It seems that base class methods are given the name of the first child class which calls it, and that subsequent usages by other child classes then get accounted incorrectly.

Here is a small reproducer:

`test_yappi.py` ```py #!/usr/bin/env python3 from test_yappi import Base class A(Base): def __init__(self): self.sleep(1) class B(Base): def __init__(self): self.sleep(1) def main(): _ = A() for _ in range(10): _ = B() if __name__ == '__main__': import yappi yappi.start(builtins=False) main() yappi.get_thread_stats().print_all() stats = yappi.convert2pstats(yappi.get_func_stats()) stats.print_stats() stats.dump_stats('pstats.prof') ```
`test_yappi.py` ```py import time class Base(object): def sleep(self, tout): time.sleep(tout) ```

test_yappi/__init__.py exists but is empty.

With this code I would expect to find A.sleep and B.sleep, or to find Base.sleep as profiled methods, but in fact I find only A.sleep:

$ strings pstats.prof | grep sleep
A.sleep)

and the stats result then reflect that:

$ ./test_yappi.py 

name           id     tid              ttot      scnt        
_MainThread    0      140335337744192  0.001044  1         
         23 function calls in 0.000 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.001    0.001 ./test_yappi.py:13(main)
       11    0.000    0.000    0.001    0.000 [...]/test_yappi/base.py:6(A.sleep)
       10    0.000    0.000    0.001    0.000 ./test_yappi.py:10(B.__init__)
        1    0.000    0.000    0.000    0.000 ./test_yappi.py:6(A.__init__)

(BTW: I would have expected different values for cumtime, about 11 seconds for main - what happened?)

When attempting to evaluate results it is basically impossible to separate what contributions the sleep calls for A and B have, respectively. In this specific case it helps a bit to plot the results, but in any non-trivial use case that becomes unwieldy very fast:

output

Is this known behavior? Is there a way to avoid the mis-labeling?

Thanks, Andre.

sumerc commented 1 year ago

Is this known behavior? Is there a way to avoid the mis-labeling?

The underlying hashtable works by hashing the CodeObject* and as both A and B share the same function (basic inheritance), the stats go into same slot. In this case, it goes to A.sleep, but depending on the order it could have also gone into B.sleep.

The solution might seem easy but the tricky part is to implement this without affecting performance (getting functionname everytime a func. is called). So, stay tuned. I will hopefully find a solution to this.

I would have expected different values for cumtime, about 11 seconds for main - what happened?)

By default, yappi measures CPU time. time.sleep does not consume any CPU. You can profile wall time by:

yappi.set_clock_type("wall")
yappi.start()
...
yappi.stop()
andre-merzky commented 1 year ago

That all makes sense, thanks for clarifying this for me, much appreciated.

So, stay tuned. I will hopefully find a solution to this.

Since you are aware of this, please feel free to close this ticket.