squeaky-pl / japronto

Screaming-fast Python 3.5+ HTTP toolkit integrated with pipelining HTTP server based on uvloop and picohttpparser.
MIT License
8.61k stars 581 forks source link

Using await slows down requests #10

Open channelcat opened 7 years ago

channelcat commented 7 years ago
from japronto import Application

async def test():
    return

async def hello1(request):
    return request.Response(text='Hello world!')

async def hello2(request):
    await test()
    return request.Response(text='Hello world!')

async def hello3(request):
    await test()
    await test()
    return request.Response(text='Hello world!')

app = Application()
app.router.add_route('/1', hello1)
app.router.add_route('/2', hello2)
app.router.add_route('/3', hello3)
app.run()
ubuntu@ip-172-31-32-165:~$ wrk -c 100 -t 1 -d 4 http://localhost:8080/1
Running 4s test @ http://localhost:8080/1
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   430.75us  147.27us   1.59ms   58.13%
    Req/Sec   221.14k     3.71k  223.98k    97.50%
  879259 requests in 4.00s, 77.14MB read
Requests/sec: 219769.64
Transfer/sec:     19.28MB
ubuntu@ip-172-31-32-165:~$ wrk -c 100 -t 1 -d 4 http://localhost:8080/2
Running 4s test @ http://localhost:8080/2
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.93ms  185.44us   1.46ms   52.76%
    Req/Sec   106.63k   547.71   107.35k    65.00%
  424253 requests in 4.00s, 37.22MB read
Requests/sec: 106040.32
Transfer/sec:      9.30MB
ubuntu@ip-172-31-32-165:~$ wrk -c 100 -t 1 -d 4 http://localhost:8080/3
Running 4s test @ http://localhost:8080/3
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.95ms  189.56us   1.44ms   53.78%
    Req/Sec   104.28k   526.13   105.11k    52.50%
  415229 requests in 4.00s, 36.43MB read
Requests/sec: 103789.66
Transfer/sec:      9.11MB

It seems as though subsequent uses of the event loop don't take the same toll. I was unable to test the effects on pipelining because of issue #9.

squeaky-pl commented 7 years ago

This is expected. Coroutines have a considerable performance hit due to cost of loop.create_task which is very high.

Japronto is able to rewrite simple coroutines with no await into functions because such couroutines are pretty useless coroutines with no gain. The bytecode rewriting logic could be expanded to cover cases like this but it's not clear if it's worth it. I think educating people to not write code like this is better.

What's happening here is that the router will take test() coroutine and rewrite it as an ordinary function and plug in a copy into routing table.

channelcat commented 7 years ago

Ah, that makes sense. However, consider the following common practice:

def require_auth(func):
    async def auth_stuff(*args, **kwargs):
        # do stuff
        return await func(*args, **kwargs)
    return auth_stuff

@require_auth
async def hello1(request):
    return request.Response(json={"things": "stuff"})

At this point I'm not even doing IO or even storing response data to be passed around in middleware etc., but Japronto's down to ~73k req/sec without pipelining (I cannot test async with pipelining). AFAIK, all the other micro frameworks japronto measures itself against do not take this type of performance hit.

It feels like Japronto is heavily optimized to benchmark as a web server, but when you reach the inevitable point of writing some python code around it, there's a large initial performance hit. The project advertises itself as a web framework, but the benchmarks treat it as a server (pipelining, no async functionality).

What are your thoughts on this?

squeaky-pl commented 7 years ago

The project started as a toy load balancer/reverse proxy scriptable for other microservices over low latency network so some directions and decisions I took stem from that.

1) in case it's a database in the same data center it should be written without async/await because it will be fasterthis way, actually maybe a thread pool will be better here. I will introduce something working like thread executor later becaue the default one in asyncio is going through too many layers of abstraction to be viable

Japronto will never have middlewares, it will have subviews instead. You would put that in a subview and decorate it with thread pool. Something like Pyramid's resource traversal on steroids.You would put that in a separate view and decorate with a thread pool I am gonna code in C later.

If you cannot make your db respond fast enough the problem lies in your db.

2) if you really need to use await because there is a call to external API hidden there somewhere else with considerable latency there is no way you can make this fast...

I wanna drive people away from writing async/await code because it's like their last option really for REST style apps. Still async await has it's uses like slow uploads, downloads, websockets etc.

My investigations kind of confirm what Mike Bayer says about asynchronous: http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/ I will try to write an article later to elaborate on the issue. I am also seriously considering writing fast blocking style database drivers in the same fashion I wrote the server.

Async/await is not about speed, it's about handling concurrency.

channelcat commented 7 years ago

At its core async/await addresses IO-bound concurrency, but I believe its integration into Python's syntax says "this is how we want to share non-blocking code in Python" rather than "this is how we optimize all IO in Python." In that, I want to be able to use the new libraries other Python developers are writing, as long as the performance hit doesn't outweigh the time saved.

I know it can be frustrating to watch the community jump on the event loop train without understanding where it's strong or weak (anyone remember nodejs? lol), but I think their excitement comes from its simplicity. I really hope Japronto will optimize for async/await, even if it means users won't be communicating with local services as fast as possible.

Lastly, the article you linked was written before uvloop and asyncpg. What were your findings when looking into it?

ludovic-gasc commented 7 years ago

Hi everybody,

It's really cool to see that the fire to understand how it's really work behind the scene and how improve the efficiency in Python continues to burn in some people ;-)

About blocking/non-blocking debate with efficiency, after 3 years of production for me, this debate has less and less sense to me, because the "true" answer is: "It depends".

For me, in the real world where people are more and more mobile, with more and more network issues, I see async pattern as one of several possibility to reduce the "cost" of open sockets.

The other point it's very important to distinguish is the theory/patterns and the implementation: Between the article of Mike and now, Yuri has released uvloop and asyncpg, Yuri, Inada, Victor and several other persons has improved efficiency of CPython itself, @channelcat has released Sanic, you released Japronto. I'm not sure that the numbers will be the same now compare to 2 years ago, but it's always the same pattern used.

For me, a reboot of @zzzeek benchmark with the latest CPython release and all new knowledge about optimizations, it should be interesting.

And don't forget something: Computers are now too much complicated for somebody can understand all interactions. To say in other way, we are all incompetent to understand everything, be careful before to blame ;-)