Suhail-MOHD commented 4 years ago

Proposal to make yappi grok greenlets

This PR is a proposal to make yappi understand greenlets. The change allows yappi to map each greenlet to a unique execution context. Further, it also adjusts CPU stats measured to account for interleaving greenlet execution on one thread.

Changes required

The following changes are required by the fix:

ensure _profile_thread is called only from the target thread to be profiled.

_profile_thread invokes the _current_context_id operation which retrieves the execution context of a thread.

Using _current_context_id to retrieve the context information from another thread is not guaranteed to work and is hard to get right when yappi is configured with set_context_id_callback because the callback is not passed the information about what thread it has to fetch information for.

It is also to hard to achieve the above when using greenlets because there is no obvious way to retrieve the greenlet currently executing on another thread. i.e there is no simple way to retrieve the greenlet currently executing on threadA from threadB.

To get around this, this change guarantees that _profile_thread and _ensure_thread_profiled is only called from the target thread by removing their invocations from the 'start' sequence when 'multithreading' is enabled and delaying it. This is done by attaching a separate profiler func callback (profile_event and profile_event_c) to the threads. These functions first run _ensure_thread_profiled in the target thread context and change the profiler func callback to _yapp_callback. This way the work done by the start sequence is reduced and _ensure_thread_profiled is delayed to the first time a callback is invoked on the target thread.
add set_ctx_backend and allow specifying threading backend as greenlets as suggested by @sumerc
modify _current_context_id to identify the greenlet currently being executed when backend is greenlets. This is done the same way as it is for threads, except here we use the dictionary of the greenlet rather than that of the thread.
account for interleaving greenlet executions on the same native thread

greenlet is a co-operative multitasking framework that introduces user controlled threads called greenlets. Each native python thread can run a number of greenlets. Greenlets on the same thread co-operatively switch between one another using greenlet.switch. it is important to note that a greenlet can only execute on the thread it was spawned on. As a result, greenlets spawned on one thread can only switch between each other.

Greenlets do not play well with yappi. Yappi uses thread local stats to track the CPU usage of each frame. It tracks the usage at the entry of each function and at the exit of each function and subtracts the values to get the total clock cycles spent in that frame. This does not work well with greenlets. Let's take an example of two greenlets A and B and view the following sequence of events.
- greenletA is currently running
- greenletA enters function foo
- greenletA co-operatively switches to greenletB
- greenletB consumes 1 second of CPU
- greenletB switches over back to greenletA
- greenletA exits function foo
The CPU usage of function foo will be 1 second more than it should be because greenletB's execution is interleaved.

To account for this we have to track the context switches and adjust stats. The solution implemented in this change is a derivative of https://github.com/ajdavis/GreenletProfiler with some modifications to account for multiple native threads

To track context switches:
- declare a new thread local variable tl_prev_ctx. This needs to be thread local since multiple threads can exist. Greenlet context switches should be tracked at a thread level rather than at a global level.
- tl_prev_ctx is initialised in _ensure_thread_profiled. It is initialised to the greenlet currently running on that thread
- use tl_prev_ctx in _yapp_callback to determine if the context changed since the last time it was set.
To adjust stats, if a switch is noticed from prev->current, then:
- pause prev. Store the time it was paused under the paused_at field of the context associated with prev
- resume the current greenlet. use the paused_at field and the current value of tickcount to determine how long the greenlet was paused for. Since a greenlet can only execute on one thread, subtracting these values is an accurate measure.

NOTE: Another way to track context switches is by using greenlet.settrace (see https://greenlet.readthedocs.io/en/latest/). However installing this trace function for each native thread and uninstalling it did not seem easy to implement.

What is remaining?

This is not a complete fix, here's what is remaining:

make this work for wall clock time or document limitations of wall clock time measurement
make declaration of tl_prev_ctx platform independent. __thread is not portable to windows AFAIK
fix build system to install greenlet before compilation of yappi C extension. greenlet.h is required by yappi.
enumerate and add more test cases
there are a few TODOs added within the code. They are to be complete before closing. Some of them require discussion with @sumerc for a better understanding.

Tests performed

The tests provided under https://github.com/sumerc/yappi/pull/53 have been modified to test correct CPU usage measurement
- The script provided by @sumerc under https://github.com/sumerc/yappi/pull/53 was run. Here are the results:
```
import gevent
import yappi
```

def burn_cpu(sec): t0 = yappi.get_clocktime() elapsed = 0 while (elapsed < sec): for in range(1000): pass elapsed = yappi.get_clock_time() - t0

def foo(): burn_cpu(0.1) gevent.sleep(1.0)

def bar(): burn_cpu(0.1) gevent.sleep(1.0)

yappi.set_clock_type("wall")

yappi.set_ctx_backend("greenlet") yappi.start() g1 = gevent.spawn(foo) g2 = gevent.spawn(bar) gevent.wait([g1, g2])

yappi.stop() yappi.get_func_stats( filter_callback=lambda x: yappi.func_matches(x, [foo, bar]) ).print_all()

Clock type: CPU Ordered by: totaltime, desc

name ncall tsub ttot tavg rand_test.py:19 bar 1 0.000032 0.100202 0.100202 rand_test.py:14 foo 1 0.000040 0.100171 0.100171


- automated tests were executed with both python2.7 and python3.8

Suhail-MOHD commented 4 years ago

@sumerc I have written down the proposal along with a change that reflects it. I am sending it along at this stage to get early feedback. There are still a few details like build issues that I am yet to fix, I have listed them down at the end of the last message. I hope what is done is enough to convey the idea and get feedback.

sumerc commented 4 years ago

Thanks for the awesome work!

I will hopefully look at this today.

sumerc commented 4 years ago

First: Thanks for your work on this!

I have inspected the code a bit. There are still some unclear points to me but overall design seems fine. I would like to ask couple of questions, though:

1/ I have seen a Greenlet dependency has been added. Is it possible to remove it? Previously I was thinking like this: instead of getting the current greenlet id from C extension, I was thinking to call Python side to get the id and use it. In Python side, we can do it by late imports. This is for: for most of the users of Yappi, greenlet support will not be used and adding a hard dep. might not be a good idea. If greenlet is installed by the user however, Yappi will just work fine. Of course: we can show some warning if the version is not supported by us? WDYT?

2/ This one is important for me: why we don't utilize get_tag_cbk? Is there a reason we only use set_ctx_id_cbk for detecting greenlet switches? What we can do instead is we set tag callback to greenlet.current_id() and detect the context switch at that level. This also eliminates the need for thread local data as _ctx structure will already be thread local. I have implemented the same approach for asyncio. Each couroutine executing will set a unique tag and you can dump stats per-tag at the end.

3/ I also could not quite understand why we need _bootstrap_profile function. But my initial feeling tells me that it might be related with 2. That is why I postpone looking it in detail until I get an answer for 2.

I have also answered questions in the code and write few comments.

Suhail-MOHD commented 4 years ago

Hi @sumerc, thanks for the quick review!

1/ I have seen a Greenlet dependency has been added. Is it possible to remove it? Previously I was thinking like this: instead of getting the current greenlet id from C extension, I was thinking to call Python side to get the id and use it. In Python side, we can do it by late imports. This is for: for most of the users of Yappi, greenlet support will not be used and adding a hard dep. might not be a good idea. If greenlet is installed by the user however, Yappi will just work fine. Of course: we can show some warning if the version is not supported by us? WDYT?

Yes, I agree with you that a hard greenlet dependency is not a good idea. Users that don't profile greenlets must not be forced to install it. When using the delayed import model, could you clarify how it would work when a user who doesn't have greenlet installed specifies backend as greenlet. Will it work like the following example?

import yappi

yappi.set_ctx_backend("greenlet") --> this will raise YappiGreenletsNotInstalled exception (or a better name :)

If this is how you envision it to work, then I believe same can be achieved while using the C api. I will try to demonstrate this in my next comment.

Also, we cannot use id(greenlet_object) because ids in python can be reused once that object is garbage collected. I did use this initially but hit errors where a greenlet object was garbage collected and another greenlet on a different native thread reused the same id again. This resulted in _log_err(15). I then moved it to the C extension and reused the same mechanism you used for threads (_yappi_tid and ycurthreadindex).

2/ This one is important for me: why we don't utilize get_tag_cbk? Is there a reason we only use set_ctx_id_cbk for detecting greenlet switches? What we can do instead is we set tag callback to greenlet.current_id() and detect the context switch at that level. This also eliminates the need for thread local data as _ctx structure will already be thread local. I have implemented the same approach for asyncio. Each couroutine executing will set a unique tag and you can dump stats per-tag at the end.

Using the tag concept to achieve this didn't cross my mind. To me, the context concept mapped naturally to greenlets because each greenlet has it's own separate stack much like native threads do. I will spend some more time understanding the tag concept and how you have used it to solve for coroutines before answering your question.

3/ I also could not quite understand why we need _bootstrap_profile function. But my initial feeling tells me that it might be related with 2. That is why I postpone looking it in detail until I get an answer for 2.

Yes this is related to 2. It was done to remove the invocation of _profile_thread from _start and delay it until we are in the the target thread. This way _current_context_id is only called from the thread whose context id we want.

Suhail-MOHD commented 4 years ago

Update regarding the greenlet dependency. I am trying to use a python function instead of the C api to retrieve the context id as suggested by you, @sumerc . It is easier that way and removes the hard dependency on greenlets. I could not find way to achieve this with the C API.

Since we can't use id(greenlet) as mentioned in an earlier comment, I thought we could use something like this:

import itertools
from greenlet import getcurrent

counter = itertools.count().next

def get_greenlet_id():
    curr_greenlet = getcurrent()
    try:
        return curr_greenlet._yappi_tid
    except AttributeError:
        id = counter()
        curr_greenlet._yappi_tid = id
        return id

sumerc commented 4 years ago

Since we can't use id(greenlet) as mentioned in an earlier comment, I thought we could use something like this:

Yes. Exactly. I am not sure if this function needs to be thread-safe but yes. The idea is correct.

Please do not forget to use _call_funcobjargs when calling from C to Python, otherwise you might end up with hard to catch errors since a context switch can happen on Python side and update some Yappi globals and return back.

sumerc commented 4 years ago

I was thinking about using tag_cbk instead of ctx_cbk and found 2 more advantages supporting this design of choice (if possible of course):

the get_thread_stats() API retrieve per-thread information. if we mess up with ctx_id retrieval, this API will return per-greenlet information which might be confusing for users. What I suggest is to add a new API: get_greenlet_stats by using tag ids if it makes any sense to look at per-greenlet statistics. Not sure if you have a use case for that?
start(multithread) flag again is called multithread but if we use ctx_id it should be called multigreenlet. If there is a use case for profiling only a single greenlet (which I suspect) we can add a new parameter for that as well instead of this param.

The core of the problem is: yappi uses some convention in its APIs related to threading for years and I think they should be doing what they are already doing. Now adding a new functionality, IMHO we should add new params/APIs instead of changing the meaning of existing ones. WDYT?

I am still investigating any potential issues in parallel, too.

sumerc commented 4 years ago

Let me give some more information about the internals of Yappi:

Every function profiled has an entry in pits table and that is a list of _pit * entries.

_pit will be generated for every PyCodeObject or PyCFunctionObject. _pit will also be generated for every new context(thread). With latest asyncio support every pit also has one unique field, too: tags. So, together they form the uniqueness of a pit:

`PyCodeObject or PyCFunctionObject pointer + thread(context) + tag`

So, if we set a new tag_id for every greenlet, that means we can generate a new pit per thread/per greenlet/per function. One important issue with this uniqueness is recursion. For total time calculation: every recursive function should accumulate the timing info whenever a function exits the last one in the stack. This means we need to hold rec_level per function somehow. Now, in Yappi, rec_level variable is held in _ctx structure per pit. This means for a specific thread, every pit has a single recursion level variable. Now if we make sure that: every pit is unique per-threadand per-greenlet by setting a new tag id per-greenlet, I think rec_level variable will be fine. As a new pit will be generated per-greenlet, the rec_level variable will be unique to that pit and this means Yappi will identify the recursion level without problem.

So, I think using tag_cbk should work with greenlets in theory. If any unclear points with above, please inform.

sumerc commented 4 years ago

One last note about above:

https://github.com/sumerc/yappi/pull/57#issuecomment-671295029 Even though it seems using tag_cbk might have some advantages, it is still perfectly fine to use ctx_id callback if it is working just fine for all the edge cases (we could disable get_threads API is backend is gevent). So, I am counting on you to make this decision @Suhail-MOHD, I just wanted to point out there is another way as well, too...

Suhail-MOHD commented 4 years ago

@sumerc I understand what you are suggesting but I am not sure tag_cbk is the right way to go after my investigation. There are some major differences between asyncio tasks and greenlets which make using tag_cbk not so straight forward with greenlets. Please see my explanation below. It explains why there is no easy way to fit greenlet into tag_cbk the way you have done for coroutines.

The difference between greenlets and asyncio is that each greenlet has its own independent python callstack, whereas all tasks in a thread in asyncio work on the same callstack. The next sections elaborates further on this.

In asyncio, multiple tasks running on one thread context switch between each other but share the same python callstack - PyThreadState->frame. When one task yields the execution context, the PyFrameObjects in the stack pertaining to that task get popped off and the frames corresponding to the coroutines of the next task are pushed onto the stack again.

For asyncio, it makes sense to have all coroutines under one context because they all work on the same python callstack and in yappi, one context has one callstack. callstack tracing is not impacted by coroutines. There are other changes you made though for other reasons:

to differentiate between a yielding coroutine vs an exiting function/coroutine and hold coroutine state (timing information) to be used once the coroutine is resumed. I believe you have solved for this by leveraging the fact that 'PyFrameObject->f_stacktop' is non NULL to detect that a coroutine frame is yielding.
allow differentiating between different asyncio tasks on the same thread. I believe you are trying to solve this using contextvars and tag_cbk based on the implemention of this test - tests/test_asyncio_context_vars.py. This will allow for segregating _pits under the same thread by tag

Greenlets are quite different from python coroutines in that each greenlet has it's own separate python stack. All greenlets running on one thread use the same PyThreadState object but the stack held under PyThreadState->frame is different for each. When the greenlet library switches from greenlet g1 to g2, it stores the current 'PyThreadState->frame' object under g1 first. It then retrieves the stack stored under g2 (stored when g2 was last paused) and sets PyThreadState->frame to this stack. Here is the source for this - greenlet switch. This essentially tricks the python interpreter into executing multiple callstacks on the same thread. There are other details to be taken care of like switching the native stack and other stack related parameters, but that is not relevant to this discussion.

Yappi today associates one callstack with one context. Each context tracks the callstack under _ctx->cs and pushes and pops frames onto this stack when _yapp_callback is invoked in response to function entry or exit in the target program being profiled. If we associate multiple greenlets with the same callstack (i.e one context), then _push_frame and _pop_frame will misbehave and the callstack would be useless. When greenletA exits a frame, yappi could end up popping an entirely different frame from the stack which may have been pushed when running under some other greenlet called greenletB. I hope this aspect makes sense, please ask me to elaborate if I haven't explained it well enough.

To ensure that we trace callstacks properly, it is important that one greenlet maps back to one callstack in yappi. Since one callstack maps to one context today, I thought it would be easier to to map a greenlet to one context. So I propose that we go forwards with ctx_id and fix the API problems that you mentioned. One direction to fixing the API problems is to raise errors when people use thread related APIs when backend is greenlet. We can introduce new APIs for greenlets while retaining internal structure of storing the data. If that sounds alright to you I can work on the finer details to this.

If using ctx_id is still not acceptable and we have to use tag_cbk, the other option is to allow yappi to track one callstack for each unique context, tag pair but I believe that pollutes the definition of context and tag. Based on your current implementation, a context seems to be anything that has a unique callstack and the callstack tracing is performed per context. tags are a way of further segregating _pits under one callstack to account for different actors working on the same callstack. Mapping multiple greenlets to one context id would break this definition because one context has to handle multiple callstacks.

Please let me know how to proceed.

sumerc commented 4 years ago

So I propose that we go forwards with ctx_id and fix the API problems that you mentioned. One direction to fixing the API problems is to raise errors when people use thread related APIs when backend is greenlet. We can introduce new APIs for greenlets while retaining internal structure of storing the data. If that sounds alright to you I can work on the finer details to this.

Ok. Fair enough.

Thanks for your investigation on this. I just wanted to see the potential for tag_cbk since: it maps better with the current API interface. But: if using ctx_cbk makes implementation conceptually simpler, I would definitely prefer that.

The next job is to take a quick peek on the API and see if there are anything that does not make sense with greenlets. (e.g: start(multithreaded) and get_thread_stats()). Like you indicated, we should define some warnings or exceptions on where these do not work. I would leave the decision of having a separate API something like: get_greenlet_stats() to you.

And do extensive testing as much as possible, of course :)

Suhail-MOHD commented 4 years ago

Thanks @sumerc, I will resume development for this and work on the following now:

addressing review comments
using python callback function to retrieve greenlet id
fixing the API (thread vs greenlet)
testing

sumerc commented 4 years ago

I have re-viewed your code with more attention and written some more comments as we made sure using ctx_id_cbk is the way forward.

Suhail-MOHD commented 4 years ago

HI @sumerc , sorry for the long silence, I was slowed down last week by some other tasks. I have uploaded what I have completed so far.

Here's what is remaining:

introduce new APIs for greenlets
specify test dependency on gevents so that it is available while running tests. Is there any particular way you would like me to do this?
add more tests. I have added some now, will spend some more time thinking about it before closing on this. I would appreciate any suggestions you have as well!

Suhail-MOHD commented 4 years ago

I have another question, is invoking yappi.get_func_stats during the execution of the profiler supported? I see that we do pause and resume the profiler here, but I'm wondering whether the callstacks stored under '_ctx' will be re-usable once we resume since we have missed certain events while paused.

Suhail-MOHD commented 4 years ago

I have changed the API as well. please take a look whenever you have time, no hurry.

Also, I am not very experienced with github PRs. If I am not following proper etiquette in uploading commits or if you would like me to upload in another manner to facilitate reviewing the large change, please do let me know.

sumerc commented 4 years ago

I have another question, is invoking yappi.get_func_stats during the execution of the profiler supported? I see that we do pause and resume the profiler here, but I'm wondering whether the callstacks stored under '_ctx' will be re-usable once we resume since we have missed certain events while paused.

Sorry, I only could find some time to look into this.

Yes it is supported.

_pause and _resume are only internal functions and is only ran when we go into C side of Yappi. It is currently mandatory as while retrieving the stats, the stats hashtable needs not to be changed (while iterating over it). In an ideal world the only events you will miss will be call events of internal profiler functions but as we are calling Python from C, there might be more and there is also a risk of a context switch happening once you go into Python side, but probability is low.

So, the answer to your question is: Yes. there is a possibility we might lose some data there. I am thinking of a refactor to remove these _pause/_resume functions and instead of enumerating stats C->Python (_pitenumstat), just add them to a list and return it. This, I assume will eliminate the need of _pause/_resume. But have not tested it.

sumerc commented 4 years ago

specify test dependency on gevents so that it is available while running tests. Is there any particular way you would like me to do this?

I think for CI, we should definitely have a hard dependency on gevent. The code is shared and I don't want to break gevent side of things when I am working on asyncio or something else. So, open to suggestions on how we add a hard dependency only for CI/CD. Use tox, use simple requirements_dev.txt. I am open to everything :)

UPDATE: I saw you used extras_require for this and it is perfectly fine!

add more tests. I have added some now, will spend some more time thinking about it before closing on this. I would appreciate any suggestions you have as well!

Adding more/more tests is always welcome! I cannot count how many times I found a bug while playing with a test case. I will also be thinking about it, too.

sumerc commented 4 years ago

I only have few comments with the API changes.

Now I will be thinking more about to add few more tests maybe like you do and I think we are close to finish :)

In overall: it is great work! Thanks for all your efforts on this :+1:

Suhail-MOHD commented 4 years ago

Thanks @sumerc! I am working on adding the tests right now, will update soon.

diogobaeder commented 4 years ago

Hi folks,

Just wanted to leave here some words of encouragement: thanks a lot for the great work! I'm really excited to see that you're getting close to finishing giving yappi proper gevent/greenlet support - I was already trying to make yappi work for a flask+gevent based application that I maintain, with this work it will become piece of cake!

Awesome job!

Cheers, Diogo

leyuwei commented 4 years ago

Hi, I am pretty excited by what is going on in this PR, because recently one of my studies (which uses the Gevent to simulate behaviors of multiple network nodes) needs to profile the memory and CPU usage. However, after I tried merging the latest yappi with the current commit in this PR, get_thread_stats() does not seem to work properly, whereas the get_func_stats() works fine. Since I am new to yappi and not quite familiar with the progress, I am wondering whether get_thread_stats() could work together with greenlets, and how get_mem_usage() could be used to profile the changing of memory usage over time.

A simplified version of my codes is as follows.

def calc(a, b):
    gevent.sleep(2)
    return a+b

g = gevent.spawn(calc, 1, 2)
g.join()
yappi.stop()
yappi.get_thread_stats().print_all()
yappi.get_func_stats().print_all()
yappi.get_mem_usage()

and an Exception raised at the get_thread_stats():

  File "C:\Program Files\Python37\lib\site-packages\yappi-1.2.4-py3.7-win-amd64.egg\yappi.py", line 1233, in get_thread_stats
    return YThreadStats().get()
  File "C:\Program Files\Python37\lib\site-packages\yappi-1.2.4-py3.7-win-amd64.egg\yappi.py", line 1080, in get
    (self._BACKEND.lower(), backend.lower())
yappi.YappiError: Cannot retrieve stats for 'native_thread' when backend is set as 'greenlet'

Suhail-MOHD commented 4 years ago

@diogobaeder Thanks for the words of encouragement :) !

@leyuwei thanks for trying out the patch. A new API get_greenlet_stats was introduced in this change. It is to be used in place of get_thread_stats when profiling greenlets, please try it out and let me know!

Also, in case you haven't already done so, before you start yappi, you must invoke yappi.set_context_backend("greenlet") to instruct yappi to profile greenlets. Please use the code snippet at the end of this comment as a reference - https://github.com/sumerc/yappi/pull/57#issue-462946105 .

leyuwei commented 4 years ago

@Suhail-MOHD Thanks for the reply and useful recommendations therein! I have already set context backend to greenlet and tested the code in your provided link, and they all work well. After making some modifications to my project, I can now get the full greenlet_stats and func_stats. Since the last question remains unsolved, I still want to ask whether yappi had the ability to record the memory usage of each greenlet over time? Thank you. ^v^

sumerc commented 4 years ago

Since the last question remains unsolved, I still want to ask whether yappi had the ability to record the memory usage of each greenlet over time?

No. Unfortunately, Yappi do not measure memory as a metric for now.

sumerc commented 4 years ago

Hi, @Suhail-MOHD how do you feel about merging this to master? Are there any more tests coming. TBH, the current tests seem OK to me but I might be lacking info. about edge cases for gevent.

Suhail-MOHD commented 4 years ago

@sumerc Sorry for the delay, I didn't get a continuous block of time to work on this in the past week. I wanted to add a few test-cases for gevents in monkey-patch mode and also a few test-cases to see if greenlet stats (context-stats) are tracked correctly. I will try to get it complete by the end of this Friday, 11th September. Does that date sound alright to you?

sumerc commented 4 years ago

@Suhail-MOHD , No prob. at all. Please take as much time as possible until you feel confident! I just wanted to make sure you are not waiting something from me :) That is all.

Suhail-MOHD commented 4 years ago

I caught a bug in greenlet stats. The following script demonstrates this. It spawns three greenlets, each taking (1, 2, 3) seconds of CPU respectively.

test-script:

import gevent
import yappi

def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0

def test():

    gs = []

    # Spawn three greenlets
    # first greenlet takes 1 second of CPU
    # second greenlet takes 2 seconds of CPU
    # third greenlet takes 3 seconds of CPU
    for i in (1, 2, 3):
        g = gevent.Greenlet(burn_cpu, i)
        g.name = "burn_cpu-%d" % (i)
        g.start()
        gs.append(g)

    for g in gs:
        g.get()

def name_cbk():
    try:
        return gevent.getcurrent().name
    except AttributeError:
        return "AnonymousGreenlet"

yappi.set_clock_type("cpu")
yappi.set_context_backend("greenlet")
yappi.set_context_name_callback(name_cbk)

yappi.start()

test()

yappi.stop()

yappi.get_greenlet_stats().print_all()

Output:

(3) suhail@C02WJ1J2HV2R ~/C/yappi (gevent-profiling)> python3 tests/test_greenlet_stats.py

name           id     ttot      scnt
burn_cpu-1     3      6.000385  1
burn_cpu-2     4      5.000312  1
burn_cpu-3     5      3.000251  1
..ousGreenlet  1      0.002979  2

Expected Output:

(3) suhail@C02WJ1J2HV2R ~/C/yappi (gevent-profiling)> python3 tests/test_greenlet_stats.py

name           id     ttot      scnt
burn_cpu-1     3      1.000000  1
burn_cpu-2     4      2.000000  1
burn_cpu-3     5      3.000000  1
..ousGreenlet  1      0.002979  2

This may be a problem with - https://github.com/sumerc/yappi/blob/master/yappi/_yappi.c#L1418 . We use the tickcount of the thread that called get_greenlet_stats. We should be using the last time the context was seen instead, i.e tickcount of the last event (_call_enter or _call_leave) seen on that particular context.

I believe this may be a problem with native thread profiling as well, writing a test to confirm.

Suhail-MOHD commented 4 years ago

Native thread version of above test:

import threading
import yappi

class TestThread(threading.Thread):
    def __init__(self, name, *args, **kwargs):
        super(TestThread, self).__init__(*args, **kwargs)
        self.my_name = name

    @classmethod
    def local_thread_name(cls):
        try:
            return threading.current_thread().my_name
        except AttributeError:
            return "Anonymous"

def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0

def test():

    ts = []
    for i in (1, 2, 3):
        name = "burn_cpu-%d" % (i)
        t = TestThread(name, target=burn_cpu, args=(i, ))
        t.start()
        ts.append(t)
    for t in ts:
        t.join()

yappi.set_clock_type("cpu")
yappi.set_context_name_callback(TestThread.local_thread_name)

yappi.start()

test()

yappi.stop()

yappi.get_thread_stats().print_all()

Output:

(3) suhail@C02WJ1J2HV2R ~/C/yappi (gevent-profiling)> python3 tests/test_native_threads.py

name           id     tid              ttot      scnt
burn_cpu-3     3      123145417895936  0.573743  332
burn_cpu-2     2      123145412640768  0.573718  338
burn_cpu-1     1      123145407385600  0.573575  168
Anonymous      0      4321620864       0.001654  7

Expected Output:

(3) suhail@C02WJ1J2HV2R ~/C/yappi (gevent-profiling)> python3 tests/test_native_threads.py

name           id     tid              ttot      scnt
burn_cpu-3     3      123145417895936  3.000000  332
burn_cpu-2     2      123145412640768  2.000000  338
burn_cpu-1     1      123145407385600  1.000000  168
Anonymous      0      4321620864       0.001654  7

Suhail-MOHD commented 4 years ago

I have fixed the last bug mentioned. Please ignore the force-push, silly mistake on my part.

Suhail-MOHD commented 4 years ago

Yappi in greenlets mode cannot catch thread-pool workers spawned by the gevent native threadpool API

Test:

from gevent import monkey
monkey.patch_all()

import gevent
import yappi

def burn_cpu(sec):
    t0 = yappi.get_clock_time()
    elapsed = 0
    while (elapsed < sec):
        for _ in range(1000):
            pass
        elapsed = yappi.get_clock_time() - t0

def b():
    burn_cpu(0.1)

def driver():
    g1 = gevent.get_hub().threadpool.apply_async(b)
    g2 = gevent.get_hub().threadpool.apply_async(b)
    g1.get()
    g2.get()

yappi.set_clock_type("cpu")
yappi.start(builtins=True)
driver()
yappi.stop()
stats = yappi.get_func_stats()
stats.print_all()

The function b does not show up in the stats at all. The problem is that gevent uses the lower-level thread API - start_new_thread which does not respect the profile function set by yappi here. As a result, the profiler is never attached to those threads. Note, this is only the case for threads spawned after yappi has been started. Threads spawned earlier will be profiled properly.

Suhail-MOHD commented 3 years ago

@sumerc, I could not find a way to hook into gevent's native threadpool to attach the profiler. I don't think there is a way to solve this within yappi.

One way I was exploring was to configure a separate threadpool class via - threadpool class configuration. But I don't think there is a way to subclass the original threadpool class and override methods so that the profiler is attached before the thread runs. At least, there is no way to do it without overriding non-public methods.

So,

Should we continue and allow this change in with a known limitation and doc it? I can look into raising a request with the gevent team to get input on how to solve this after this change is in. OR
Get input from the gevent team and fix before closing this issue

I would go for option 1 considering that profiling greenlets in the native threadpool may not be a very common use-case.

Apart from that, I am done with testing and everything else seems alright. Let me know if you need me to update anything else. Should I update the usage guide for this feature as well?

sumerc commented 3 years ago

Hi @Suhail-MOHD , sorry for late response. I could only find some time. You seem to have fixed the issue you mentioned. That is great! :)

I completely agree going with Option1 like you suggested. I assume gevent native threads does not conform to threading.setprofile() API. That is why we cannot capture those.

Anyway, I do not see this issue a blocker at all. It would be nice if you could update the usage guide including the limitation you mentioned above. Please do not spend too much time on doc. update, just roughly write down notes, I will go over them in detail. You already have worked all the way for this:)

My plan:

merge all these changes into master.
update the missing parts of documentation on master.
release a new version.

If this sounds ok, I am merging this?

diogobaeder commented 3 years ago

Hi guys,

So this means that we won't be able to profile specific greenlets? Isn't there a way around this, so that yappi can be friendly on a gevent usage situation?

Thanks, Diogo

Suhail-MOHD commented 3 years ago

@sumerc Great. I'll add the limitation to the docs right now.

@diogobaeder Yes, but we will only miss greenlets spawned on the native threadpool - http://www.gevent.org/api/gevent.threadpool.html . This threadpool is used to run functions that don't context switch at all and hence could end up blocking all other greenlets. Example: if you have a C extension that performs some blocking system calls and gevent has no way of monkey-patching this to fit into its event loop, then you can spawn this function in the native threadpool so that it does not block other greenlets from running. Only a limited set of functions should be run on the threadpool. I am still searching for an answer about how to fix this limitation - https://github.com/gevent/gevent/issues/1670

But, this is not a very common use-case and so we don't want to block the entire support for greenlets for this use-case. The plan is to get this change in and after that work on getting a way around this limitation with the help of the gevents team.

diogobaeder commented 3 years ago

Ah, got it! Sorry for my confusion :-)

Thanks again, guys!

Suhail-MOHD commented 3 years ago

@sumerc , I have added one last and final change to retrieve context name from greenlet class by default and some tests for it. While writing the docs, I realised it made sense to do this by default instead of expecting the users to provide the context name callback.

I have updated docs as well: added a usage guide under docs/ and updated README with a snippet for greenlet profiling.

I think we can merge this now 😄 , Unless you have comments about the last few commits of course.

sumerc commented 3 years ago

@Suhail-MOHD Thanks for your great work on this! :)

I will merge this to master. Tweak docs/play a bit more and release a new version.

Suhail-MOHD commented 3 years ago

Great @sumerc! Thanks for answering all my questions during the course of this change! Please feel free to loop me in again in case there are any bugs / questions you want me to answer.

sumerc / yappi

Make yappi grok greenlets #57

Proposal to make yappi grok greenlets

Changes required

What is remaining?

Tests performed

yappi.set_clock_type("wall")