sumerc / yappi

Yet Another Python Profiler, but this time multithreading, asyncio and gevent aware.
MIT License
1.45k stars 73 forks source link

Add profiling support for Gevent #53

Closed sumerc closed 3 years ago

sumerc commented 4 years ago

This PR aims to add support profiling Gevent greenlets with respect to multithreading.

mguijarr commented 4 years ago

@sumerc so, the added tests are similar to those for asyncio : covering both single and multi threaded.

advance512 commented 4 years ago

Is there any help that can be offered here? This would be extremely useful for us.

sumerc commented 4 years ago

Is there any help that can be offered here?

Help would be really appreciated! I could not find time to work on this. But it will take some time to test and tweak. We can implement something working in few days, but manual testing/tweaking/automating tests will take time. We should also throw whatever we can to it. Basically, I can implement a PoC, and it would really be helpful your guys experience on gevent to help me break it? Then we could tweak it more.

Let me think few more days on this and I will throw something to play with, hopefully. @mguijarr has already done great work on helping with the tests and explaining the internals of gevent in the relevant issue: https://github.com/sumerc/yappi/issues/47

sumerc commented 4 years ago

Ok guys.

Below is the first problem we need to solve:

import gevent
import yappi

def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0

def foo():
    burn_cpu(0.1)
    gevent.sleep(1.0)

def bar():
    burn_cpu(0.1)
    gevent.sleep(1.0)

#yappi.set_clock_type("wall")
yappi.start()
g1 = gevent.spawn(foo)
g2 = gevent.spawn(foo)
gevent.wait([g1, g2])
yappi.get_func_stats(
    filter_callback=lambda x: yappi.func_matches(x, [foo, bar])
).print_all()

Now if the clock type is CPU(which is by default) the output is as following:

Clock type: CPU
Ordered by: totaltime, desc

name                                  ncall  tsub      ttot      tavg      
gevent_yappi.py:14 foo                1      0.000038  0.200483  0.200483
gevent_yappi.py:19 bar                1      0.000034  0.100193  0.100193

CPU time of foo becomes 0.2 as gevent switches to another greenlet (bar func) and accumulates that CPU time to foo.

I will be working on a patch to fix this, then we will move onto wall time.

@mguijarr , @advance512 How can I make foo and bar run in different threads as well in this simple example? monkey_patch and monkey_patch without threads? I would like to try all kinds of possible gevent running modes?

advance512 commented 4 years ago

Why do you want to run foo and bar on different threads? That isn't a common use-case for gevent.

What you can do is use gevent's native threadpool: http://www.gevent.org/api/gevent.threadpool.html

Or, create two gevent Hub() objects, each on a different thread, and have each spawn either foo or bar.

sumerc commented 4 years ago

Why do you want to run foo and bar on different threads? That isn't a common use-case for gevent.

I would like to make tests where same function is being called concurrently from different greenlets(same thread+different threads), or some other tests as well related to multiple threads. That is why I asked how to do it.

What you can do is use gevent's native threadpool: http://www.gevent.org/api/gevent.threadpool.html ...

Thanks for the information. Will look into those!

sumerc commented 4 years ago

I have been trying to allocate time to this issue. The problem is: this issue is not easy and requires continuous/focused work which I currently don't have and it is really pissing me off :)

Here are the steps that needs to support Gevent from Yappi:

1/ Choose a minimum gevent version that we would like to support. 2/ Add a new API set_ctx_backend. THat both supports gevent and asyncio. default will be asyncio. 3/ Implement a new set_ctx_id callback that returns the current greenlet id on every function call enter/exit. 4/ On C side, if you detect a greenlet switch we will be correcting the calculations. SInce multiple greenlets can run in a single thread, all CPU time values gets accumulated over each other. I have also seen issues with wall-time too. THis is where we need to debug to understand what is happening when greenlet switch occurs. What values need to be corrected in minimum to give correct results. (to same thread or to different thread). by putting hooks into call_enter/call_leave events. This is the hardest part to do correct. 5/ Write lots of tests. (think about edge cases) 6/ Update docs. 7/ Release.

If anybody would like to work on this I can help/assist in the implementation with any of above steps/unclear points. Otherwise I will be setting this to Blocked and wait few more months unfortunately. :(

Suhail-MOHD commented 4 years ago

@sumerc Our team at nutanix would like to help!

Our service is built upon gevent and its monkey patch feature. Lately we have been searching for a profiling tool and have only found statistical profilers to work well with gevents. We were excited to find GreenletProfiler, but after further investigation we realised it cannot report CPU usage reliably for the same reason you've mentioned above.

We would like to contribute to this issue and help fix it.

As a next step, I would like to provide a fix proposal with a POC alongside it. May I proceed and pick this up?

sumerc commented 4 years ago

Hi,

I would be more than happy to accept a fix proposal and a POC!

There are already some tests written by mguijarr, for both single/multithreaded cases. I hope they might help you for edge cases.

Please proceed :)

advance512 commented 4 years ago

Exciting! Can't wait to see what @Suhail-MOHD cooks up.

achauvin-wish commented 4 years ago

We were excited to find GreenletProfiler, but after further investigation we realised it cannot report CPU usage reliably for the same reason you've mentioned above.

I was going to run some tests and saw your reply here so I wanted to ask about the results you obtained. Does the GreenletProfiler not perform accurately on python 2.7?

Suhail-MOHD commented 4 years ago

@achauvin-wish I would like to amend my previous statement after having spent some time understanding both GreenletProfiler and yappi source. GreenletProfiler did not return accurate results for the service my team works on . Our service uses multiple native threads, one thread running greenlets and other threads running operations that cannot work well if run in the greenlet thread. I can write a test that demonstrates how GreenletProfiler does not work in such a system.

For python2.7, I believe GreenletProfiler may work in a system with one native thread running multiple greenlets but I have not tested it out.

sumerc commented 4 years ago

For reference: the issue is ongoing with another PR: https://github.com/sumerc/yappi/pull/57

sumerc commented 3 years ago

57 has been merged. Thanks to @Suhail-MOHD, Yappi has initial support for gevent profiling!

A new release will hopefully coming in few days!