sumerc / yappi

Yet Another Python Profiler, but this time multithreading, asyncio and gevent aware.
MIT License
1.45k stars 73 forks source link

Need Event loop and more thread info #37

Closed t3chboy closed 4 years ago

t3chboy commented 4 years ago

Hi, This project is really awesome and useful for people looking for thread related information.

  1. I was doing some PoC for a aiohttp project, here i found some unclear data in thread stats as follows:

  2. I have created 2 aiohttp session :

    1. First() : It makes http request with response delayed by 2sec and threadstat shows it was scheduled twice.
    2. Second() : It makes 2 http request with delayed response delayed by 2 and 4sec and threadstat shows it was also scheduled twice. Here this stats sounds fishy as why second() thread got scheduled only twice, it should be more than 2.
  3. Also from stats i m not able to get which thread is assigned to particluar function,it would be great if we can link somehow which thread belongs to which async code block, so it would help to find and improvise long running thread's code/function.

  4. Most important it would be big advantage to have event loop stats also, as it will solve many problems faced in frameworks like aiohttp, where you would definitely like to know which threads are taking long time on eventloop.

Poc code :

import aiohttp
import asyncio
import time
import yappi

async def first():
    url = "http://www.mocky.io/v2/5e11f1513100005a37593fa8?mocky-delay=2000ms"
    async with aiohttp.ClientSession().get(url) as resp:
        print(resp.status)
        print(await resp.text())

async def second():
    url = "http://www.mocky.io/v2/5e11f339310000c956593fa9?mocky-delay=4000ms"
    session = aiohttp.ClientSession()
    async with session.get(url) as resp:
        print(resp.status)
        print(await resp.text())

    url = "http://www.mocky.io/v2/5e11f1513100005a37593fa8?mocky-delay=2000ms"
    async with session.get(url) as resp:
        print(resp.status)
        print(await resp.text())

async def third():
    for a in range(1000):
        print(a)

def newf():
    print("in newf")

async def fire():
    s = time.time()
    await third()
    newf()
    await second()
    await first()
    print(time.time() - s)

yappi.start()
loop = asyncio.get_event_loop()
loop.run_until_complete(fire())
yappi.stop()
yappi.get_thread_stats().print_all()

ThreadStat :

name           id     tid              ttot      scnt        
Thread         1      139813237880576  0.186779  2         
Thread         2      139813220611840  0.186705  2         
_MainThread    0      139813331167040  0.041664  5  
sumerc commented 4 years ago

This project is really awesome and useful for people looking for thread related information.

Thanks. Nice to know :)

I have created 2 aiohttp session : First() : It makes http request with response delayed by 2sec and threadstat shows it was scheduled twice. Second() : It makes 2 http request with delayed response delayed by 2 and 4sec and threadstat shows it was also scheduled twice. Here this stats sounds fishy as why second() thread got scheduled only twice, it should be more than 2.

I am not aware of how aiohttp works under the hood, but I am assuming from the output that each ClientSession creates a new underlying Thread? When you call await in a coroutine, you will not be seeing sched_count to be incremented. That is because, an await from a single event loop is just like yield. It returns control to event loop and event loop actually decides another coroutine to run. The time that sched_count is incremented is when a profiler callback event happens and the thread id is different than the previous one. You can see the relevant code here: https://github.com/sumerc/yappi/blob/1d85fd395407ddae2552bfa8641414c86d3d2ca9/yappi/_yappi.c#L1077 So, what might be happening under the hood is that the functions first and second are simply running in their own thread(the code in with() to be precise) and own event loop, thus you only see 2 as sched_count.

Also from stats i m not able to get which thread is assigned to particluar function,it would be great if we can link somehow which thread belongs to which async code block, so it would help to find and improvise long running thread's code/function.

There is a API for this kind of thing implemented recently. Please see: set_tag_callback. I might have misunderstood the requirement here but you can find an example use in the docs. It is currently being used in some ASGI applications to link a single request running in different threads/coroutines.

See here and here for more detailed info.

And please comment if you are asking for something simpler, maybe?

Most important it would be big advantage to have event loop stats also, as it will solve many problems faced in frameworks like aiohttp, where you would definitely like to know which threads are taking long time on eventloop.

I am not an async. expert here but AFAIK: there is only one thread per eventloop, right? If so, which thread taking more time is probably irrelevant? Which coroutine is taking long time on an eventloop makes more sense, right? If that is your question, then that is JUST what we have tried solving here. And if you follow the links I have provided above especially the one in FastAPI, you can see how they have implemented an ASGI middleware to find which coroutines are taking most of the time.

Hope this helps...

sumerc commented 4 years ago

I will close this issue if no further questions?