timeit support for coroutines

Tinche commented 2 years ago

Hello!

I think pyperf is an amazing project and I use the timeit to benchmark essentially all the libraries I work on (attrs, cattrs, incant...).

I wish I could use it to benchmark async functions though. Right now, I benchmark asyncio.run(my_coro) but since asyncio.run is so costly there's a ton of noise in the signal.

I think essentially pyperf could detect a coroutine was passed in, spawn an event loop and just await it in a loop.

corona10 commented 2 years ago

@vstinner Do you have any ideas?

vstinner commented 2 years ago

I don't know how to do that.

cc @methane

methane commented 2 years ago

@Tinche Do you really mean timeit, but not bench_func().

bench_func() receives function. So it may be possible to detect the func is coroutine or not. On the other hand, timeit receives an expression, not a function. So it is difficult to detect that coroutine is passed in.

methane commented 2 years ago

Would you give us some examples?

Tinche commented 2 years ago

I use timeit all the time in the terminal and I've never used bench_func(), so probably timeit. Maybe it could be a flag or a different command?

Here's an example. I have a project, https://github.com/Tinche/incant/, that does function composition (mostly for dependency injection), and I want to measure how efficient it is. It supports functions and coroutines. Functions I can benchmark easily, coroutines I need to benchmark using asyncio.run, and that has a ton of noise since it does a lot of unrelated work.

Note that these coroutines I'm benchmarking are technically async, but they either do not await anything or they await sleep(0).

I usually prepare the function being tested in a file and then do something like:

pyperf timeit -g -s "from asyncio import run; from test import main" "run(main())"

so since I need to have a separate file bench_func() could work too. The CLI interface is sooo nice though ;)

That said, maybe there's a way to run a coroutine without involving an event loop? Just iterate over it until it's done or something like that? I'm not proficient in that part of Python.

methane commented 2 years ago

Would you try this?

pyperf timeit -g -s "import asyncio; loop=asyncio.get_event_loop(); from test import main" \
  "loop.run_until_complete(main())"

or

pyperf timeit -g -s "import asyncio, test" \ 
  "asyncio.get_event_loop().run_until_complete(test.main())"

With this, one loop is used repeatedly instead of creating and destroying loops for each main() execution. Is this reduce your "noise"?

Tinche commented 2 years ago

It does work and helps a little. If it's too hard to do otherwise in pyperf I will accept this as the answer ;)

methane commented 2 years ago

What "little" means? It reduce your noise only little? If so, it means this feature request will have only little benefit.

If you just meant "I don't want to write this timeit", I'm sorry. But it is very difficult. Again, timeit receives statements, not function. So timeit can not distinguish async code automatically.

I will consider about adding bench_async_func() or bench_func() supports async func. And I will consider adding --async option to timeit later.

Tinche commented 2 years ago

Well, it reduces the running time by a lot, so it reduces noise by a lot.

I have a generated coroutine that that I'm benchmarking. This coroutine awaits several other coroutines inside.

asyncio.run: Mean +- std dev: 529 us +- 42 us loop.run_until_complete: Mean +- std dev: 185 us +- 12 us

So the difference was noise introduced by asyncio.run. Hence, a big improvement. Dunno how much more it can be improved by logic inside pyperf.

Tinche commented 2 years ago

Offtopic: heh, for comparison's sake, if I change the test so they are all ordinary functions, not async def functions, it takes 1 microsecond. I wasn't aware asyncio/the event loop adds so much overhead.

vstinner commented 2 years ago

So the difference was noise introduced by asyncio.run

Each call to asyncio.run() creates a new fresh event loop, and then closes it. Moreover, it also shutdowns asynchronous generators and the default asyncio executor (thread pool).

methane commented 2 years ago

I usually prepare the function being tested in a file and then do something like:

Since you write script for test already, I don't think timeit command is so important for you. If #124 is merged, you can just add few lines in your test code:

if __name__ == '__main__':
    import pyperf
    pyperf.Runner().bench_async_func('main', main)

Tinche commented 2 years ago

@methane Thanks a lot! Trying from your branch, now the time is: main: Mean +- std dev: 123 us +- 7 us. Looks like we got rid of all the overhead.

vstinner commented 2 years ago

Fixed by https://github.com/psf/pyperf/pull/124 thanks to @methane.

vstinner commented 2 years ago

I closed the issue because it seems like the idea of adding an --async option to pyperf timeit was abandonned. But I'm open to this idea if someone wants to write a PR for that!

vstinner commented 2 years ago

I was curious and compared doc/examples/bench_async_func.py between Python 3.6 and 3.10, since the pyperf implementation is different (Python 3.6 doesn't have asyncio.run()):

$ python3 -m pyperf compare_to py36.json py310.json 
Mean +- std dev: [py36] 1.33 ms +- 0.02 ms -> [py310] 1.32 ms +- 0.02 ms: 1.01x faster

Using an asyncio sleep of 1 ms, there is no significant difference: for me, it confirms that the pyperf implementation is correct ;-) The accuracy is good. We don't measure the time spent to create and close the event loop.

vstinner commented 2 years ago

(A) Benchmark on asyncio.run() with bench_func() on a coroutine func() which does nothing:

import asyncio
import pyperf

async def func():
    pass

def bench():
    asyncio.run(func())

runner = pyperf.Runner()
runner.bench_func('bench', bench)

(B) Benchmark on loop.run_until_complete(loop) with bench_func() on a coroutine func() which does nothing:

import asyncio
import pyperf

async def func():
    pass

def bench(loop):
    loop.run_until_complete(func())

runner = pyperf.Runner()
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
runner.bench_func('bench', bench, loop)

(C) Benchmark pyperf 2.3.1 new bench_async_func() method on a coroutine func() which does nothing:

import asyncio
import pyperf

async def func():
    pass

runner = pyperf.Runner()
runner.bench_async_func('bench', func)

Results on Python 3.10:

asyncio_run_py310
=================

bench: Mean +- std dev: 139 us +- 5 us

run_until_complete-py310
========================

bench: Mean +- std dev: 16.7 us +- 0.4 us

bench_async_func-py310
======================

bench: Mean +- std dev: 128 ns +- 2 ns

+-----------+-------------------+--------------------------+-------------------------+
| Benchmark | asyncio_run_py310 | run_until_complete-py310 | bench_async_func-py310  |
+===========+===================+==========================+=========================+
| bench     | 139 us            | 16.7 us: 8.33x faster    | 128 ns: 1087.31x faster |
+-----------+-------------------+--------------------------+-------------------------+

The std dev is way better using bench_async_func()!

asyncio.run(): +- 5 us (5000 ns)
loop.run_until_complete(): +- 0.4 us (400 ns)
bench_async_func(): +- 2 ns (2 ns)

vstinner commented 2 years ago

I think essentially pyperf could detect a coroutine was passed in, spawn an event loop and just await it in a loop.

I don't think that detecting if the argument looks like a coroutine or not is not a good idea. It requires to import asyncio which is a "heavy" module (high startup time). I strongly prefer having a separated API (method) for that.

vstinner commented 2 years ago

This function is now part of the just released pyperf 2.3.1.

psf / pyperf

timeit support for coroutines #121