vmprof / vmprof-python

vmprof - a statistical program profiler
http://vmprof.com
Other
433 stars 54 forks source link

How to quickly pinpoint blocking (ie non-async) calls slowing down the event loop? #243

Open davidmcnabnz opened 2 years ago

davidmcnabnz commented 2 years ago

Hi there,

I've got a situation where my asyncio event loop slows down due to a number of blocking calls being made by tasks within that loop.

Once the task count builds (under system load), each task is polluting the event loop with a number of blocking calls, even if brief. Once load reaches a certain point, the whole thread's throughput collapses, because the thread is spending close to 100% of its time in these blocking calls.

vmprof has been helpful, especially with PyCharm's tool for visualising the call graph with time annotations. But so far, I haven't figured out how to quickly detect sync calls which block and slow down the event loop, as opposed to async calls which promptly relinquish control.

So far, the best tool I've had for detecting these blocking calls has been to intermittently insert a "probe" task which loops on:

probing = True
probePeriod = 0.01
while probing:  # some other task will set this to False to stop the probing
    tsBefore = time.time()
    tsDeadline = tsBefore + probePeriod
    await asyncio.sleep(probePeriod)
    lateness = time.time() - tsDeadline
    myMovingAverageMonitor.add(lateness)

With this, I can throw work at the event loop and see what aspects of work are causing this 'lateness' to increase. Then, binary-search through the loop's tasks, selectively stubbing out more and more areas of work, until I find the work areas which are causing the greatest obstruction to the event loop. However, this is pretty crude. I'd like to be able to do this with a lot less trial and error.

Any words of wisdom, folks?

Thanks if you can help. David