Add async profiling support for gevent

mguijarr commented 4 years ago

First of all, thanks a lot for this project, yappi is really the best Python profiler !

In my projects I use gevent extensively. I was full of hope when I found this blog article:

https://emptysqua.re/blog/greenletprofiler/

Someone made a greenlet profiler a few years ago on top of yappi...

But the project is Python 2, and it was coming with a modified, bundled version of yappi.

Now that yappi supports coroutines (with asyncio, unfortunately), could you please give me some guidance how to redo what was done with greenletprofiler ?

I would be happy to contribute to yappi with gevent support, but I need some help - my C++ skills are rusty and I don't know the code.

I saw in this discussion https://github.com/sumerc/yappi/issues/21 that it was something you thought about once... But I am afraid set_ctx_backend() was finally not implemented ?

Thanks a lot

sumerc commented 4 years ago

First of all, thanks a lot for this project, Yappi is really the best Python profiler !

Thanks a lot! Motivating words :)

I saw in this discussion #21 that it was something you thought about once... But I am afraid set_ctx_backend() was finally not implemented ?

I have been thinking about this a lot. We have more than one async framework available in the Python ecosystem and it would be great to have a support for more than one for sure. I don't have too much internal knowledge of gevent but I think it is possible to have same async-friendly profiling features with gevent, too (maybe with few modifications)

So, basically I am OK, but I need time to learn how gevent works under the hood to discover the necessary parts to implement set_ctx_backend API. I will be looking into that. In parallel what you can help me with is:

Understand if greenlet_profiler is still working? Has cheduler semantics of gevent changed (like is there an option that makes greenlet_profiler useless?)
Write(or port) gevent related tests (like in greeenlet_profiler) in a Python agnostic way (for 2.7 upto 3.9 or which versions gevent supports)

Then we can hopefully work together to make these tests pass in Yappi itself?

How that sounds?

Someone made a greenlet profiler a few years ago on top of yappi...

That guy is @ajdavis . Awesome programmer :) Hi, @ajdavis, we were discussing something like above, I am pretty sure you don't have any time but any comment/idea on above? Thanks :)

mguijarr commented 4 years ago

Thanks for the quick answer! I am going to try to answer your questions asap

ajdavis commented 4 years ago

Hello! My interests have moved very far from this project and I can't say what would be required to upgrade greenlet_profiler to the latest Python, Yappi, and Gevent. I also recall having some concerns, personally, about greenlet_profiler's accuracy, but I don't remember the details anymore. Good luck.

I think it's a worthy goal to have a good greenlet profiler. Perhaps someone else has built a greenlet profiler that works today, or that's a better starting point than greenlet_profiler?

sumerc commented 4 years ago

Thank you for your precious input!

Let's see if there is any other movement in that area, too and nice to hear from you again :)

mguijarr commented 4 years ago

I made a little survey there is no Python 3 greenlet profiler. Not yet :wink:

@ajdavis I also checked forks of your greenlet profiler, with no luck - it does not work (test do not pass), in fact some adaptations have been made but it misses your changes to yappi so it is not helpful.

I am going to try to answer @sumerc questions in the previous message as soon as possible.

Thanks a lot guys ! Have a nice week-end in those difficult times.

mguijarr commented 4 years ago

I installed Python 2.7 and Python 3.7 in two different environments, to do some tests. In both environments I installed latest gevent 1.4.

Python 2.7, original GreenletProfiler code

Here is a gevent adaptation of the original test suite of GreenletProfiler:

from functools import partial
import unittest

import gevent

import GreenletProfiler

def find_func(ystats, name):
    items = [
        yfuncstat for yfuncstat in ystats
        if yfuncstat.name == name]

    assert len(items) == 1
    return items[0]

def assert_children(yfuncstats, names, msg):
    names = set(names)
    callees = set([
        ychildstat.name for ychildstat in yfuncstats.children
    ])

    final_msg = '%s: expected %s, got %s' % (
        msg, ', '.join(names), ', '.join(callees))

    assert names == callees, final_msg

def spin(n):
    for _ in range(n * 10000):
        pass

class GreenletTest(unittest.TestCase):
    spin_cost = None

    @classmethod
    def setUpClass(cls):
        # Measure the CPU cost of spin() as a baseline.
        GreenletProfiler.set_clock_type('cpu')
        GreenletProfiler.start()
        for _ in range(10):
            spin(1)
        GreenletProfiler.stop()
        f_stats = GreenletProfiler.get_func_stats()
        spin_stat = find_func(f_stats, 'spin')
        GreenletTest.spin_cost = spin_stat.ttot / 10.0
        GreenletProfiler.clear_stats()

    def tearDown(self):
        GreenletProfiler.stop()
        GreenletProfiler.clear_stats()

    def assertNear(self, x, y, margin=0.3):
        if abs(x - y) / float(x) > margin:
            raise AssertionError(
                "%s is not within %d%% of %s" % (x, margin * 100, y))

    def test_three_levels(self):
        def a():
            gevent.sleep(0) # a bit equivalent to greenlet.switch()
            b()
            spin(1)

        def b():
            spin(5)
            gevent.sleep(0)
            c()

        def c():
            spin(7)

        GreenletProfiler.set_clock_type('cpu')
        GreenletProfiler.start(builtins=True)
        g = gevent.spawn(a)
        gevent.sleep(0) # give hand to the hub to run another greenlet
        spin(2)
        gevent.sleep(0)
        spin(3)
        gevent.sleep(0)
        assert g.ready # greenlet is done
        GreenletProfiler.stop()

        ystats = GreenletProfiler.get_func_stats()

        # Check the stats for spin().
        spin_stat = find_func(ystats, 'spin')
        self.assertEqual(5, spin_stat.ncall)
        self.assertAlmostEqual(18 * self.spin_cost, spin_stat.ttot,
                               places=2, msg="spin()'s total time is wrong")

        assert_children(spin_stat, ['range'], 'spin() has wrong callees')

        # Check the stats for a().
        a_stat = find_func(ystats, 'a')
        self.assertEqual(1, a_stat.ncall, 'a() not called once')
        assert_children(
            a_stat,
            ['spin', 'b', "sleep"],
            'a() has wrong callees')

        self.assertAlmostEqual(13 * self.spin_cost, a_stat.ttot,
                               places=2, msg="a()'s total time is wrong")

        self.assertAlmostEqual(13 * self.spin_cost, a_stat.tavg,
                               places=2, msg="a()'s average time is wrong")

        self.assertAlmostEqual(a_stat.tsub, 0,
                               places=2, msg="a()'s subtotal is wrong")

        # Check the stats for b().
        b_stat = find_func(ystats, 'b')
        self.assertEqual(1, b_stat.ncall, 'b() not called once')
        assert_children(
            b_stat,
            ['spin', 'c', "sleep"],
            'b() has wrong callees')

        self.assertAlmostEqual(12 * self.spin_cost, b_stat.ttot,
                               places=2, msg="b()'s total time is wrong")

        self.assertAlmostEqual(12 * self.spin_cost, b_stat.tavg,
                               places=2, msg="b()'s average time is wrong")

        self.assertAlmostEqual(b_stat.tsub, 0,
                               places=2, msg="b()'s subtotal is wrong")

        # Check the stats for c().
        c_stat = find_func(ystats, 'c')
        self.assertEqual(1, c_stat.ncall, 'c() not called once')
        assert_children(c_stat, ['spin'], 'c() has wrong callees')
        self.assertAlmostEqual(7 * self.spin_cost, c_stat.ttot,
                               places=2, msg="c()'s total time is wrong")

        self.assertAlmostEqual(7 * self.spin_cost, c_stat.tavg,
                               places=2, msg="c()'s average time is wrong")

        self.assertAlmostEqual(c_stat.tsub, 0,
                               places=2, msg="c()'s subtotal is wrong")

    def test_recursion(self):
        def r(n):
            spin(1)
            gevent.sleep(0)
            if n > 1:
                r(n - 1)

            gevent.sleep(0)

        def s(n):
            spin(1)
            gevent.sleep(0)
            if n > 1:
                s(n - 1)

            gevent.sleep(0)

        GreenletProfiler.set_clock_type('cpu')
        GreenletProfiler.start(builtins=True)
        g0 = gevent.spawn(partial(r, 10))  # Run r 10 times.
        gevent.sleep(0)
        g1 = gevent.spawn(partial(s, 2))  # Run s 2 times.
        gevent.sleep(0)
        greenlets = [g0, g1]

        # Run all greenlets to completion.
        gevent.joinall(greenlets, raise_error=True)

        GreenletProfiler.stop()
        ystats = GreenletProfiler.get_func_stats()

        # Check the stats for spin().
        spin_stat = find_func(ystats, 'spin')
        self.assertEqual(12, spin_stat.ncall)

        # r() ran spin(1) 10 times, s() ran spin(1) 2 times.
        self.assertNear(12, spin_stat.ttot / self.spin_cost)
        assert_children(spin_stat, ['range'], 'spin() has wrong callees')

        # Check the stats for r().
        r_stat = find_func(ystats, 'r')
        self.assertEqual(10, r_stat.ncall)
        assert_children(
            r_stat,
            ['spin', 'r', "sleep"],
            'r() has wrong callees')

        self.assertNear(10, r_stat.ttot / self.spin_cost)
        self.assertNear(1, r_stat.tavg / self.spin_cost)
        self.assertAlmostEqual(0, r_stat.tsub, places=3)

        # Check the stats for s().
        s_stat = find_func(ystats, 's')
        self.assertEqual(2, s_stat.ncall)
        assert_children(
            s_stat,
            ['spin', 's', "sleep"],
            's() has wrong callees')

        self.assertNear(2, s_stat.ttot / self.spin_cost)
        self.assertNear(1, s_stat.tavg / self.spin_cost)
        self.assertAlmostEqual(0, s_stat.tsub, places=3)

if __name__ == '__main__':
    unittest.main()

Basically I just replaced explicit calls to greenlet.switch() by gevent.sleep(0), to have the same behaviour as the original test with greenlets.

Indeed, gevent is scheduling coroutines execution on I/O - so, when sleeping the code returns to the gevent Hub (the main greenlet), and switches to another one.

Tests pass:

..
----------------------------------------------------------------------
Ran 2 tests in 0.033s

OK

So, at first glance I can confirm the latest gevent works with the original GreenletProfiler.

Python 3.7, one of the forks of GreenletProfiler, latest yappi

Then I tested the same code as above with Python 3 and one of the forks of GreenletProfiler, adapted to Python 3, with latest Yappi.

Of course it does not work (this is the problem :wink):

FF
======================================================================
FAIL: test_recursion (__main__.GreenletTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_greenlet_profiler.py", line 180, in test_recursion
    assert_children(spin_stat, ['range'], 'spin() has wrong callees')
  File "test_greenlet_profiler.py", line 27, in assert_children
    assert names == callees, final_msg
AssertionError: spin() has wrong callees: expected range, got 

======================================================================
FAIL: test_three_levels (__main__.GreenletTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_greenlet_profiler.py", line 94, in test_three_levels
    assert_children(spin_stat, ['range'], 'spin() has wrong callees')
  File "test_greenlet_profiler.py", line 27, in assert_children
    assert names == callees, final_msg
AssertionError: spin() has wrong callees: expected range, got 

----------------------------------------------------------------------
Ran 2 tests in 0.019s

The patches of @ajdavis are not there...

The good news is that the test is compatible with both Python 2/Python 3 with no change.

Note about gevent, monkey patching and threads (maybe useful if you do not know gevent)

The good thing about gevent is that it proposes to patch the Python standard library, to have coroutines (greenlets) scheduled automatically when I/O operations happen. This is a big difference compared to explicit switches like with asyncio.

But, I do not think this behaviour makes a big difference for the profiler ?

It is possible to only enable the monkey-patching partially. Usually people patch nothing (see 1. below), all (see 2. below) or everything but the threading module (see 3. below):

1. No monkey patching at all

threads are normal Python threads
there is one gevent Hub per thread
I/O is blocking by default (like normal), no implicit scheduling of coroutines
- calls to gevent.sleep(), gevent.run(), gevent.joinall() etc executes the loop and schedule coroutines
- no coroutine scheduling happens while doing I/O : socket.recv(), time.sleep(), subprocess pipes etc are the classic ones : no implicit scheduling while doing I/O

2. If gevent monkey-patching is full

Python threads are in fact coroutines (gevent greenlets)
there is only 1 gevent Hub
all I/O calls become gevent-friendly : operations are set to non-blocking, the gevent loop takes care of completion of the I/O tasks and schedules coroutines (greenlets) implicitely : it switches to greenlets while I/O is happening

3. Monkey-patching of everything except threads

there is one different gevent Hub (= main greenlet) per thread
threads are Python threads (OS threads) except that gevent coroutines are scheduled for each Hub
I/O calls are gevent-friendly (see above)

I do not know if all those cases make a difference for yappi. But I thought it would be more clear to tell about that. In fact, more tests should be written for the different cases when the first code starts to work, maybe, in order to ensure it's all fine...

Hope this helps !

mguijarr commented 4 years ago

This is a test to run with yappi without GreenletProfiler ; I took the code from GreenletProfiler.py and adapted it to gevent (see previous comment) and to run without the need to have GreenletProfiler, only yappi:

from functools import partial
from contextlib import contextmanager
import unittest

import gevent

from yappi import set_clock_type, get_func_stats, set_context_id_callback, set_context_name_callback, start, stop, clear_stats

def start_profiling(builtins, threads):
   """Starts profiling all threads and all greenlets.

   This function can be called from any thread at any time.
   Resumes profiling if stop() was called previously.
   """
   set_context_id_callback(lambda: gevent and id(gevent.getcurrent()) or 0)
   set_context_name_callback(lambda: gevent and gevent.getcurrent().__class__.__name__ or '')
   start(builtins, threads)

def stop_profiling():
    """Stops the currently running yappi instance.

    The same profiling session can be resumed later by calling start().
    """
    stop()
    set_context_id_callback(None)

@contextmanager
def profiling(builtins=False, threads=True):
    set_clock_type("cpu")
    start_profiling(builtins, threads)
    yield
    stop_profiling()

def find_func(ystats, name):
    items = [
        yfuncstat for yfuncstat in ystats
        if yfuncstat.name == name]

    assert len(items) == 1
    return items[0]

def assert_children(yfuncstats, names, msg):
    names = set(names)
    callees = set([
        ychildstat.name for ychildstat in yfuncstats.children
    ])

    final_msg = '%s: expected %s, got %s' % (
        msg, ', '.join(names), ', '.join(callees))

    assert names == callees, final_msg

def spin(n):
    for _ in range(n * 10000):
        pass

class GreenletTest(unittest.TestCase):
    spin_cost = None

    @classmethod
    def setUpClass(cls):
        # Measure the CPU cost of spin() as a baseline.
        with profiling():
            for _ in range(10):
                spin(1)
        f_stats = get_func_stats()
        spin_stat = find_func(f_stats, 'spin')
        GreenletTest.spin_cost = spin_stat.ttot / 10.0
        clear_stats()

    def assertNear(self, x, y, margin=0.3):
        if abs(x - y) / float(x) > margin:
            raise AssertionError(
                "%s is not within %d%% of %s" % (x, margin * 100, y))

    def test_three_levels(self):
        def a():
            gevent.sleep(0) # a bit equivalent to greenlet.switch()
            b()
            spin(1)

        def b():
            spin(5)
            gevent.sleep(0)
            c()

        def c():
            spin(7)

        with profiling(builtins=True):
            g = gevent.spawn(a)
            gevent.sleep(0) # give hand to the hub to run another greenlet
            spin(2)
            gevent.sleep(0)
            spin(3)
            gevent.sleep(0)
        assert g.ready # greenlet is done

        ystats = get_func_stats()

        # Check the stats for spin().
        spin_stat = find_func(ystats, 'spin')
        self.assertEqual(5, spin_stat.ncall)
        self.assertAlmostEqual(18 * self.spin_cost, spin_stat.ttot,
                               places=2, msg="spin()'s total time is wrong")

        assert_children(spin_stat, ['range'], 'spin() has wrong callees')

        # Check the stats for a().
        a_stat = find_func(ystats, 'a')
        self.assertEqual(1, a_stat.ncall, 'a() not called once')
        assert_children(
            a_stat,
            ['spin', 'b', "sleep"],
            'a() has wrong callees')

        self.assertAlmostEqual(13 * self.spin_cost, a_stat.ttot,
                               places=2, msg="a()'s total time is wrong")

        self.assertAlmostEqual(13 * self.spin_cost, a_stat.tavg,
                               places=2, msg="a()'s average time is wrong")

        self.assertAlmostEqual(a_stat.tsub, 0,
                               places=2, msg="a()'s subtotal is wrong")

        # Check the stats for b().
        b_stat = find_func(ystats, 'b')
        self.assertEqual(1, b_stat.ncall, 'b() not called once')
        assert_children(
            b_stat,
            ['spin', 'c', "sleep"],
            'b() has wrong callees')

        self.assertAlmostEqual(12 * self.spin_cost, b_stat.ttot,
                               places=2, msg="b()'s total time is wrong")

        self.assertAlmostEqual(12 * self.spin_cost, b_stat.tavg,
                               places=2, msg="b()'s average time is wrong")

        self.assertAlmostEqual(b_stat.tsub, 0,
                               places=2, msg="b()'s subtotal is wrong")

        # Check the stats for c().
        c_stat = find_func(ystats, 'c')
        self.assertEqual(1, c_stat.ncall, 'c() not called once')
        assert_children(c_stat, ['spin'], 'c() has wrong callees')
        self.assertAlmostEqual(7 * self.spin_cost, c_stat.ttot,
                               places=2, msg="c()'s total time is wrong")

        self.assertAlmostEqual(7 * self.spin_cost, c_stat.tavg,
                               places=2, msg="c()'s average time is wrong")

        self.assertAlmostEqual(c_stat.tsub, 0,
                               places=2, msg="c()'s subtotal is wrong")

    def test_recursion(self):
        def r(n):
            spin(1)
            gevent.sleep(0)
            if n > 1:
                r(n - 1)

            gevent.sleep(0)

        def s(n):
            spin(1)
            gevent.sleep(0)
            if n > 1:
                s(n - 1)

            gevent.sleep(0)

        with profiling():
            g0 = gevent.spawn(partial(r, 10))  # Run r 10 times.
            gevent.sleep(0)
            g1 = gevent.spawn(partial(s, 2))  # Run s 2 times.
            gevent.sleep(0)
            greenlets = [g0, g1]

            # Run all greenlets to completion.
            gevent.joinall(greenlets, raise_error=True)

        ystats = get_func_stats()

        # Check the stats for spin().
        spin_stat = find_func(ystats, 'spin')
        self.assertEqual(12, spin_stat.ncall)

        # r() ran spin(1) 10 times, s() ran spin(1) 2 times.
        self.assertNear(12, spin_stat.ttot / self.spin_cost)
        assert_children(spin_stat, ['range'], 'spin() has wrong callees')

        # Check the stats for r().
        r_stat = find_func(ystats, 'r')
        self.assertEqual(10, r_stat.ncall)
        assert_children(
            r_stat,
            ['spin', 'r', "sleep"],
            'r() has wrong callees')

        self.assertNear(10, r_stat.ttot / self.spin_cost)
        self.assertNear(1, r_stat.tavg / self.spin_cost)
        self.assertAlmostEqual(0, r_stat.tsub, places=3)

        # Check the stats for s().
        s_stat = find_func(ystats, 's')
        self.assertEqual(2, s_stat.ncall)
        assert_children(
            s_stat,
            ['spin', 's', "sleep"],
            's() has wrong callees')

        self.assertNear(2, s_stat.ttot / self.spin_cost)
        self.assertNear(1, s_stat.tavg / self.spin_cost)
        self.assertAlmostEqual(0, s_stat.tsub, places=3)

if __name__ == '__main__':
    unittest.main()

sumerc commented 4 years ago

Hi Matias,

Thanks for the valuable information on gevent. I have also done some research on my side to see what needs to be done to work this out. Let me also share my findings, too.

First off, let's start from the notion of coroutine-aware profiling as this is the key to what we would like to achieve here. Yappi is a deterministic profiler that catches function call/exit events to profile a Python application. When it comes to coroutines, however, this does not work well as each coroutine yield from or await(new syntax) becomes a function exit. This means that Yappi was thinking a function has just exited whenever it sees an await which is problematic since it loses the wall-time info and increments the callcount(which it should not). This issue is described in detail with examples here: https://github.com/sumerc/yappi/blob/coroutine-profiling/doc/coroutine-profiling.md I really suggest you read this. So, in summary for asyncio, the problem was wall-time. CPU-time is OK since we exit the function everytime and we don't add the CPU time to the coroutine when the coroutine is not executing.

Now, coming to gevent. It is exactly the opposite. Now in gevent, when a greenlet leaves execution, it does not exit the function, instead the Hub gives the control to another greenlet, right? Now, this means that now Wall-time is OK, but CPU-time is not. In fact, this is exactly what greenlet_profiler is trying to solve years ago. It solved the problem by detecting a greenlet switch and subtracting the elapsed CPU time from the callstacks of all other threads. It utilizes the API set_ctx_id_callback. However, now with implementing the same in asyncio, I know for sure that set_ctx_id_cbk API is not the correct solution since multiple threads can run multiple greenlets at the same time which causes problems. I think this is why @ajdavis said he was getting incorrect results at the time. We need another layer on top of threads to detect a greenlet switch occurred by getting a callback on greenlet_id. I am still thinking about how can we implement this new API and/or incorporate with the existing functionality.

I know above is a lot to grasp but it is not that hard. In summary, I assume we have CPU-time profiling issues with gevent. Please do some tests on your side to verify this: wall-time profiling is fine and CPU is not. In the meantime, we will need lots of tests to verify our profiling results make sense and we also need to support multithreaded gevent, meaning lots of threads running lots of greenlets concurrently and that should not affect our results. You can help me especially on that: there is a test_asyncio file in tests folder and we need exactly the equivalent of that file for greenlet, too. It contains both Multithread and SingleThread tests and I assume it is concise on what it is testing for. We should be writing same kind of tests for gevent.

And once, I find a solution to this new API required, I think you can also help me on C side, too if you like.

Thaks, pls feel free to ask any unclear points.

sumerc commented 4 years ago

When I find time, I will be trying to provide some small examples to clarify more.

mguijarr commented 4 years ago

Well, I think I have already things to do - I will try to adapt the test you have for asyncio to gevent when I get some time. Thanks for all this, already !

sumerc commented 3 years ago

Closing the issue as gevent support is added with https://github.com/sumerc/yappi/pull/53

sumerc / yappi