Closed mguijarr closed 3 years ago
First of all, thanks a lot for this project, Yappi is really the best Python profiler !
Thanks a lot! Motivating words :)
I saw in this discussion #21 that it was something you thought about once... But I am afraid set_ctx_backend() was finally not implemented ?
I have been thinking about this a lot. We have more than one async
framework available in the Python ecosystem and it would be great to have a support for more than one for sure. I don't have too much internal knowledge of gevent
but I think it is possible to have same async-friendly
profiling features with gevent, too (maybe with few modifications)
So, basically I am OK, but I need time to learn how gevent works under the hood to discover the necessary parts to implement set_ctx_backend
API. I will be looking into that. In parallel what you can help me with is:
greenlet_profiler
is still working? Has cheduler semantics of gevent changed (like is there an option that makes greenlet_profiler
useless?)greeenlet_profiler
) in a Python agnostic way (for 2.7 upto 3.9 or which versions gevent supports)Then we can hopefully work together to make these tests pass in Yappi itself?
How that sounds?
Someone made a greenlet profiler a few years ago on top of yappi...
That guy is @ajdavis . Awesome programmer :) Hi, @ajdavis, we were discussing something like above, I am pretty sure you don't have any time but any comment/idea on above? Thanks :)
Thanks for the quick answer! I am going to try to answer your questions asap
Hello! My interests have moved very far from this project and I can't say what would be required to upgrade greenlet_profiler to the latest Python, Yappi, and Gevent. I also recall having some concerns, personally, about greenlet_profiler's accuracy, but I don't remember the details anymore. Good luck.
I think it's a worthy goal to have a good greenlet profiler. Perhaps someone else has built a greenlet profiler that works today, or that's a better starting point than greenlet_profiler?
Thank you for your precious input!
Let's see if there is any other movement in that area, too and nice to hear from you again :)
I made a little survey there is no Python 3 greenlet profiler. Not yet :wink:
@ajdavis I also checked forks of your greenlet profiler, with no luck - it does not work (test do not pass), in fact some adaptations have been made but it misses your changes to yappi so it is not helpful.
I am going to try to answer @sumerc questions in the previous message as soon as possible.
Thanks a lot guys ! Have a nice week-end in those difficult times.
I installed Python 2.7 and Python 3.7 in two different environments, to do some tests. In both environments I installed latest gevent 1.4.
Here is a gevent adaptation of the original test suite of GreenletProfiler:
from functools import partial
import unittest
import gevent
import GreenletProfiler
def find_func(ystats, name):
items = [
yfuncstat for yfuncstat in ystats
if yfuncstat.name == name]
assert len(items) == 1
return items[0]
def assert_children(yfuncstats, names, msg):
names = set(names)
callees = set([
ychildstat.name for ychildstat in yfuncstats.children
])
final_msg = '%s: expected %s, got %s' % (
msg, ', '.join(names), ', '.join(callees))
assert names == callees, final_msg
def spin(n):
for _ in range(n * 10000):
pass
class GreenletTest(unittest.TestCase):
spin_cost = None
@classmethod
def setUpClass(cls):
# Measure the CPU cost of spin() as a baseline.
GreenletProfiler.set_clock_type('cpu')
GreenletProfiler.start()
for _ in range(10):
spin(1)
GreenletProfiler.stop()
f_stats = GreenletProfiler.get_func_stats()
spin_stat = find_func(f_stats, 'spin')
GreenletTest.spin_cost = spin_stat.ttot / 10.0
GreenletProfiler.clear_stats()
def tearDown(self):
GreenletProfiler.stop()
GreenletProfiler.clear_stats()
def assertNear(self, x, y, margin=0.3):
if abs(x - y) / float(x) > margin:
raise AssertionError(
"%s is not within %d%% of %s" % (x, margin * 100, y))
def test_three_levels(self):
def a():
gevent.sleep(0) # a bit equivalent to greenlet.switch()
b()
spin(1)
def b():
spin(5)
gevent.sleep(0)
c()
def c():
spin(7)
GreenletProfiler.set_clock_type('cpu')
GreenletProfiler.start(builtins=True)
g = gevent.spawn(a)
gevent.sleep(0) # give hand to the hub to run another greenlet
spin(2)
gevent.sleep(0)
spin(3)
gevent.sleep(0)
assert g.ready # greenlet is done
GreenletProfiler.stop()
ystats = GreenletProfiler.get_func_stats()
# Check the stats for spin().
spin_stat = find_func(ystats, 'spin')
self.assertEqual(5, spin_stat.ncall)
self.assertAlmostEqual(18 * self.spin_cost, spin_stat.ttot,
places=2, msg="spin()'s total time is wrong")
assert_children(spin_stat, ['range'], 'spin() has wrong callees')
# Check the stats for a().
a_stat = find_func(ystats, 'a')
self.assertEqual(1, a_stat.ncall, 'a() not called once')
assert_children(
a_stat,
['spin', 'b', "sleep"],
'a() has wrong callees')
self.assertAlmostEqual(13 * self.spin_cost, a_stat.ttot,
places=2, msg="a()'s total time is wrong")
self.assertAlmostEqual(13 * self.spin_cost, a_stat.tavg,
places=2, msg="a()'s average time is wrong")
self.assertAlmostEqual(a_stat.tsub, 0,
places=2, msg="a()'s subtotal is wrong")
# Check the stats for b().
b_stat = find_func(ystats, 'b')
self.assertEqual(1, b_stat.ncall, 'b() not called once')
assert_children(
b_stat,
['spin', 'c', "sleep"],
'b() has wrong callees')
self.assertAlmostEqual(12 * self.spin_cost, b_stat.ttot,
places=2, msg="b()'s total time is wrong")
self.assertAlmostEqual(12 * self.spin_cost, b_stat.tavg,
places=2, msg="b()'s average time is wrong")
self.assertAlmostEqual(b_stat.tsub, 0,
places=2, msg="b()'s subtotal is wrong")
# Check the stats for c().
c_stat = find_func(ystats, 'c')
self.assertEqual(1, c_stat.ncall, 'c() not called once')
assert_children(c_stat, ['spin'], 'c() has wrong callees')
self.assertAlmostEqual(7 * self.spin_cost, c_stat.ttot,
places=2, msg="c()'s total time is wrong")
self.assertAlmostEqual(7 * self.spin_cost, c_stat.tavg,
places=2, msg="c()'s average time is wrong")
self.assertAlmostEqual(c_stat.tsub, 0,
places=2, msg="c()'s subtotal is wrong")
def test_recursion(self):
def r(n):
spin(1)
gevent.sleep(0)
if n > 1:
r(n - 1)
gevent.sleep(0)
def s(n):
spin(1)
gevent.sleep(0)
if n > 1:
s(n - 1)
gevent.sleep(0)
GreenletProfiler.set_clock_type('cpu')
GreenletProfiler.start(builtins=True)
g0 = gevent.spawn(partial(r, 10)) # Run r 10 times.
gevent.sleep(0)
g1 = gevent.spawn(partial(s, 2)) # Run s 2 times.
gevent.sleep(0)
greenlets = [g0, g1]
# Run all greenlets to completion.
gevent.joinall(greenlets, raise_error=True)
GreenletProfiler.stop()
ystats = GreenletProfiler.get_func_stats()
# Check the stats for spin().
spin_stat = find_func(ystats, 'spin')
self.assertEqual(12, spin_stat.ncall)
# r() ran spin(1) 10 times, s() ran spin(1) 2 times.
self.assertNear(12, spin_stat.ttot / self.spin_cost)
assert_children(spin_stat, ['range'], 'spin() has wrong callees')
# Check the stats for r().
r_stat = find_func(ystats, 'r')
self.assertEqual(10, r_stat.ncall)
assert_children(
r_stat,
['spin', 'r', "sleep"],
'r() has wrong callees')
self.assertNear(10, r_stat.ttot / self.spin_cost)
self.assertNear(1, r_stat.tavg / self.spin_cost)
self.assertAlmostEqual(0, r_stat.tsub, places=3)
# Check the stats for s().
s_stat = find_func(ystats, 's')
self.assertEqual(2, s_stat.ncall)
assert_children(
s_stat,
['spin', 's', "sleep"],
's() has wrong callees')
self.assertNear(2, s_stat.ttot / self.spin_cost)
self.assertNear(1, s_stat.tavg / self.spin_cost)
self.assertAlmostEqual(0, s_stat.tsub, places=3)
if __name__ == '__main__':
unittest.main()
Basically I just replaced explicit calls to greenlet.switch()
by gevent.sleep(0)
, to have the same behaviour as the original test with greenlets.
Indeed, gevent is scheduling coroutines execution on I/O - so, when sleeping the code returns
to the gevent Hub
(the main greenlet), and switches to another one.
Tests pass:
..
----------------------------------------------------------------------
Ran 2 tests in 0.033s
OK
So, at first glance I can confirm the latest gevent works with the original GreenletProfiler.
Then I tested the same code as above with Python 3 and one of the forks of GreenletProfiler, adapted to Python 3, with latest Yappi.
Of course it does not work (this is the problem :wink):
FF
======================================================================
FAIL: test_recursion (__main__.GreenletTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_greenlet_profiler.py", line 180, in test_recursion
assert_children(spin_stat, ['range'], 'spin() has wrong callees')
File "test_greenlet_profiler.py", line 27, in assert_children
assert names == callees, final_msg
AssertionError: spin() has wrong callees: expected range, got
======================================================================
FAIL: test_three_levels (__main__.GreenletTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_greenlet_profiler.py", line 94, in test_three_levels
assert_children(spin_stat, ['range'], 'spin() has wrong callees')
File "test_greenlet_profiler.py", line 27, in assert_children
assert names == callees, final_msg
AssertionError: spin() has wrong callees: expected range, got
----------------------------------------------------------------------
Ran 2 tests in 0.019s
The patches of @ajdavis are not there...
The good news is that the test is compatible with both Python 2/Python 3 with no change.
The good thing about gevent is that it proposes to patch the Python standard library,
to have coroutines (greenlets) scheduled automatically when I/O operations happen. This is a big difference compared to explicit switches like with asyncio
.
But, I do not think this behaviour makes a big difference for the profiler ?
It is possible to only enable the monkey-patching partially. Usually people patch nothing (see 1. below), all (see 2. below) or everything but the threading module (see 3. below):
Hub
per threadgevent.sleep()
, gevent.run()
, gevent.joinall()
etc executes the loop and schedule coroutinessocket.recv()
, time.sleep()
, subprocess
pipes etc are the classic ones : no implicit scheduling while doing I/OHub
Hub
(= main greenlet) per threadI do not know if all those cases make a difference for yappi. But I thought it would be more clear to tell about that. In fact, more tests should be written for the different cases when the first code starts to work, maybe, in order to ensure it's all fine...
Hope this helps !
This is a test to run with yappi without GreenletProfiler ; I took the code from GreenletProfiler.py
and adapted it to gevent (see previous comment) and to run without the need to have GreenletProfiler
, only yappi
:
from functools import partial
from contextlib import contextmanager
import unittest
import gevent
from yappi import set_clock_type, get_func_stats, set_context_id_callback, set_context_name_callback, start, stop, clear_stats
def start_profiling(builtins, threads):
"""Starts profiling all threads and all greenlets.
This function can be called from any thread at any time.
Resumes profiling if stop() was called previously.
"""
set_context_id_callback(lambda: gevent and id(gevent.getcurrent()) or 0)
set_context_name_callback(lambda: gevent and gevent.getcurrent().__class__.__name__ or '')
start(builtins, threads)
def stop_profiling():
"""Stops the currently running yappi instance.
The same profiling session can be resumed later by calling start().
"""
stop()
set_context_id_callback(None)
@contextmanager
def profiling(builtins=False, threads=True):
set_clock_type("cpu")
start_profiling(builtins, threads)
yield
stop_profiling()
def find_func(ystats, name):
items = [
yfuncstat for yfuncstat in ystats
if yfuncstat.name == name]
assert len(items) == 1
return items[0]
def assert_children(yfuncstats, names, msg):
names = set(names)
callees = set([
ychildstat.name for ychildstat in yfuncstats.children
])
final_msg = '%s: expected %s, got %s' % (
msg, ', '.join(names), ', '.join(callees))
assert names == callees, final_msg
def spin(n):
for _ in range(n * 10000):
pass
class GreenletTest(unittest.TestCase):
spin_cost = None
@classmethod
def setUpClass(cls):
# Measure the CPU cost of spin() as a baseline.
with profiling():
for _ in range(10):
spin(1)
f_stats = get_func_stats()
spin_stat = find_func(f_stats, 'spin')
GreenletTest.spin_cost = spin_stat.ttot / 10.0
clear_stats()
def assertNear(self, x, y, margin=0.3):
if abs(x - y) / float(x) > margin:
raise AssertionError(
"%s is not within %d%% of %s" % (x, margin * 100, y))
def test_three_levels(self):
def a():
gevent.sleep(0) # a bit equivalent to greenlet.switch()
b()
spin(1)
def b():
spin(5)
gevent.sleep(0)
c()
def c():
spin(7)
with profiling(builtins=True):
g = gevent.spawn(a)
gevent.sleep(0) # give hand to the hub to run another greenlet
spin(2)
gevent.sleep(0)
spin(3)
gevent.sleep(0)
assert g.ready # greenlet is done
ystats = get_func_stats()
# Check the stats for spin().
spin_stat = find_func(ystats, 'spin')
self.assertEqual(5, spin_stat.ncall)
self.assertAlmostEqual(18 * self.spin_cost, spin_stat.ttot,
places=2, msg="spin()'s total time is wrong")
assert_children(spin_stat, ['range'], 'spin() has wrong callees')
# Check the stats for a().
a_stat = find_func(ystats, 'a')
self.assertEqual(1, a_stat.ncall, 'a() not called once')
assert_children(
a_stat,
['spin', 'b', "sleep"],
'a() has wrong callees')
self.assertAlmostEqual(13 * self.spin_cost, a_stat.ttot,
places=2, msg="a()'s total time is wrong")
self.assertAlmostEqual(13 * self.spin_cost, a_stat.tavg,
places=2, msg="a()'s average time is wrong")
self.assertAlmostEqual(a_stat.tsub, 0,
places=2, msg="a()'s subtotal is wrong")
# Check the stats for b().
b_stat = find_func(ystats, 'b')
self.assertEqual(1, b_stat.ncall, 'b() not called once')
assert_children(
b_stat,
['spin', 'c', "sleep"],
'b() has wrong callees')
self.assertAlmostEqual(12 * self.spin_cost, b_stat.ttot,
places=2, msg="b()'s total time is wrong")
self.assertAlmostEqual(12 * self.spin_cost, b_stat.tavg,
places=2, msg="b()'s average time is wrong")
self.assertAlmostEqual(b_stat.tsub, 0,
places=2, msg="b()'s subtotal is wrong")
# Check the stats for c().
c_stat = find_func(ystats, 'c')
self.assertEqual(1, c_stat.ncall, 'c() not called once')
assert_children(c_stat, ['spin'], 'c() has wrong callees')
self.assertAlmostEqual(7 * self.spin_cost, c_stat.ttot,
places=2, msg="c()'s total time is wrong")
self.assertAlmostEqual(7 * self.spin_cost, c_stat.tavg,
places=2, msg="c()'s average time is wrong")
self.assertAlmostEqual(c_stat.tsub, 0,
places=2, msg="c()'s subtotal is wrong")
def test_recursion(self):
def r(n):
spin(1)
gevent.sleep(0)
if n > 1:
r(n - 1)
gevent.sleep(0)
def s(n):
spin(1)
gevent.sleep(0)
if n > 1:
s(n - 1)
gevent.sleep(0)
with profiling():
g0 = gevent.spawn(partial(r, 10)) # Run r 10 times.
gevent.sleep(0)
g1 = gevent.spawn(partial(s, 2)) # Run s 2 times.
gevent.sleep(0)
greenlets = [g0, g1]
# Run all greenlets to completion.
gevent.joinall(greenlets, raise_error=True)
ystats = get_func_stats()
# Check the stats for spin().
spin_stat = find_func(ystats, 'spin')
self.assertEqual(12, spin_stat.ncall)
# r() ran spin(1) 10 times, s() ran spin(1) 2 times.
self.assertNear(12, spin_stat.ttot / self.spin_cost)
assert_children(spin_stat, ['range'], 'spin() has wrong callees')
# Check the stats for r().
r_stat = find_func(ystats, 'r')
self.assertEqual(10, r_stat.ncall)
assert_children(
r_stat,
['spin', 'r', "sleep"],
'r() has wrong callees')
self.assertNear(10, r_stat.ttot / self.spin_cost)
self.assertNear(1, r_stat.tavg / self.spin_cost)
self.assertAlmostEqual(0, r_stat.tsub, places=3)
# Check the stats for s().
s_stat = find_func(ystats, 's')
self.assertEqual(2, s_stat.ncall)
assert_children(
s_stat,
['spin', 's', "sleep"],
's() has wrong callees')
self.assertNear(2, s_stat.ttot / self.spin_cost)
self.assertNear(1, s_stat.tavg / self.spin_cost)
self.assertAlmostEqual(0, s_stat.tsub, places=3)
if __name__ == '__main__':
unittest.main()
Hi Matias,
Thanks for the valuable information on gevent
. I have also done some research on my side to see what needs to be done to work this out. Let me also share my findings, too.
First off, let's start from the notion of coroutine-aware
profiling as this is the key to what we would like to achieve here. Yappi is a deterministic profiler that catches function call/exit events to profile a Python application. When it comes to coroutines, however, this does not work well as each coroutine yield from
or await
(new syntax) becomes a function exit. This means that Yappi was thinking a function has just exited whenever it sees an await
which is problematic since it loses the wall-time info and increments the callcount(which it should not). This issue is described in detail with examples here: https://github.com/sumerc/yappi/blob/coroutine-profiling/doc/coroutine-profiling.md I really suggest you read this. So, in summary for asyncio
, the problem was wall-time. CPU-time is OK since we exit the function everytime and we don't add the CPU time to the coroutine when the coroutine is not executing.
Now, coming to gevent
. It is exactly the opposite. Now in gevent, when a greenlet leaves execution, it does not exit the function, instead the Hub
gives the control to another greenlet, right? Now, this means that now Wall-time is OK, but CPU-time is not. In fact, this is exactly what greenlet_profiler is trying to solve years ago. It solved the problem by detecting a greenlet switch and subtracting the elapsed CPU time from the callstacks of all other threads. It utilizes the API set_ctx_id_callback
. However, now with implementing the same in asyncio, I know for sure that set_ctx_id_cbk
API is not the correct solution since multiple threads can run multiple greenlets at the same time which causes problems. I think this is why @ajdavis said he was getting incorrect results at the time. We need another layer on top of threads to detect a greenlet switch occurred by getting a callback on greenlet_id
. I am still thinking about how can we implement this new API and/or incorporate with the existing functionality.
I know above is a lot to grasp but it is not that hard. In summary, I assume we have CPU-time profiling issues with gevent
. Please do some tests on your side to verify this: wall-time profiling is fine and CPU is not. In the meantime, we will need lots of tests to verify our profiling results make sense and we also need to support multithreaded gevent, meaning lots of threads running lots of greenlets concurrently and that should not affect our results. You can help me especially on that: there is a test_asyncio
file in tests
folder and we need exactly the equivalent of that file for greenlet, too. It contains both Multithread and SingleThread tests and I assume it is concise on what it is testing for. We should be writing same kind of tests for gevent.
And once, I find a solution to this new API required, I think you can also help me on C side, too if you like.
Thaks, pls feel free to ask any unclear points.
When I find time, I will be trying to provide some small examples to clarify more.
Well, I think I have already things to do - I will try to adapt the test you have for asyncio to gevent when I get some time. Thanks for all this, already !
Closing the issue as gevent
support is added with https://github.com/sumerc/yappi/pull/53
First of all, thanks a lot for this project, yappi is really the best Python profiler !
In my projects I use gevent extensively. I was full of hope when I found this blog article:
https://emptysqua.re/blog/greenletprofiler/
Someone made a greenlet profiler a few years ago on top of yappi...
But the project is Python 2, and it was coming with a modified, bundled version of yappi.
Now that yappi supports coroutines (with asyncio, unfortunately), could you please give me some guidance how to redo what was done with greenletprofiler ?
I would be happy to contribute to yappi with gevent support, but I need some help - my C++ skills are rusty and I don't know the code.
I saw in this discussion https://github.com/sumerc/yappi/issues/21 that it was something you thought about once... But I am afraid
set_ctx_backend()
was finally not implemented ?Thanks a lot