Closed vlad-khramov closed 5 years ago
Thanks for the question and I'll get back to this a bit later this weekend!
First of all, you'd better run the benchmark script from a different machine than the server - it consumes CPU too. It is also a good idea to separate the database the same way. TL;DR: Python is that "slow", performance is possibly fair.
My benchmark is as follows:
Server script for the test:
import sys
from sanic import Sanic
from sanic.response import json
from gino.ext.sanic import Gino
import asyncpg
DB_DSN = 'postgresql://localhost/gino'
POOL_SIZE = 10
app = Sanic(configure_logging=False)
if __name__ == '__main__':
if len(sys.argv) > 1 and sys.argv[1] == 'asyncpg':
print('using asyncpg')
pool = None
@app.listener('before_server_start')
async def before_server_start(_, loop):
global pool
pool = await asyncpg.create_pool(DB_DSN, max_size=POOL_SIZE)
@app.route('/')
async def test(request):
await pool.fetchval('SELECT now()')
return json({'hello': 'world'})
elif len(sys.argv) > 1 and sys.argv[1] == 'gino':
print('using gino')
app.config.DB_DSN = DB_DSN
app.config.DB_POOL_MAX_SIZE = POOL_SIZE
db = Gino(app)
@app.route('/')
async def test(request):
await db.scalar('SELECT now()')
return json({'hello': 'world'})
else:
print('no database query')
@app.route('/')
async def test(request):
return json({'hello': 'world'})
app.run(host='0.0.0.0', port=8000, workers=1)
wrk -t 1 -c 32 http://localhost:8000
Vanilla Sanic: 10krps
Sanic with asyncpg: 3krps
Sanic with GINO: 2krps
Doubling the POOL_SIZE
produces slightly better results, but not significant enough. Tweaking the wrk
number of threads and connections makes no big difference too. Watching the metrics, it seems that the Sanic server contributes almost 100% CPU usage for all three cases, while the PostgreSQL processes are still quite idle. Given that, I believe the bottleneck in this case is CPU overhead.
I could try to find the most time-consuming operations in asyncpg and GINO and try to improve them, but I don't think there is a single place which takes a lot of computing resources once for all. Because the CPU time of each request is 0.1ms for vanilla sanic, 0.33ms with asyncpg, and 0.5ms with GINO, that is 0.23ms overhead on asyncpg, and 0.17ms overhead on GINO, which is approximately the number of lines of Python code that is executed correspondingly. In this case, adding more CPUs would increase the throughput, while keeping the same number of total database connections.
If using only select queries, it is also possible to increase throughput by:
Re:
I know, that I can increase max connections of pg.
Yes you can do that. But please be aware that, when increasing the connection pool size, you hit the CPU bottleneck soon. Adding more DB connections after that balance point won't increase throughput any further - it will eventually slow down the total process time (latency) of each request.
Also I can disable DB_USE_CONNECTION_FOR_REQUEST
Yeah that'll save some time, but limited I think. The lazy acquire won't block even when the DB pool is exhausted.
or release shared connection from pool earlier.
Yes this allows other coroutines to use the released connection earlier. If the releasing coroutine has a lot more I/O jobs to do after that, it may improve the overall throughput for sure.
Thanks a lot for your answer!
Yes, I've used benchmark script (yandex.tank) from another server. And I've also seen that cpu consuption of python script about 100% and posgres about 1
By the way, on which server have you run your benchmark? Laptop/aws etc?
This is a general performance question. It may or may not be related to Gino, Sanic and/or Database.
Besides Fantix's notes on Gino itself, I'd also take a step back to see the necessity of optimizations and how to get the best result with minimum effort. If I'm in this situation, I'd do the following:
I only list a very small portion of possibilities, and each bullet point can go a long way. Itself is too big a topic. :)
From technical perspective, I'd love to discuss about how to improve Gino performance; from business perspective, without more specs, focusing on Gino itself may not give us as many benefits as other optimisations.
on which server have you run your benchmark? Laptop/aws etc?
It was a dual-core 2.2GHz i7 on my MacBook Air (Early 2016).
Closing due to inactivity
Description
I've created simple app with sanic/gino. Several controllers which do 1-2 queies to db (select only).
I've made performance test ( the smallest droplet on Digital ocean, DB_POOL_MAX_SIZE=70). Controllers with db queries holds only about 100rps. Without - about 1000rps.
I know, that I can increase max connections of pg. Also I can disable DB_USE_CONNECTION_FOR_REQUEST or release shared connection from pool earlier.
What else could be done to increase rps? The fact, that I use only select queries could be helpful?