Closed BimaAdi closed 2 months ago
Hello! Thank you for you benchmarks. But, I've found some strange moments.
First of all, let's see my results:
insert bulk time
shape: (11, 3)
| | psqlpy | psycopg |
| --- | --- | --- |
| str | f64 | f64 |
|-----|--------|---------|
| 1 | 0.027 | 0.034 |
| 2 | 0.025 | 0.024 |
| 3 | 0.024 | 0.023 |
| 4 | 0.022 | 0.021 |
| 5 | 0.022 | 0.021 |
| 6 | 0.022 | 0.021 |
| 7 | 0.023 | 0.02 |
| 8 | 0.022 | 0.021 |
| 9 | 0.023 | 0.021 |
| 10 | 0.024 | 0.021 |
| avg | 0.023 | 0.023 |
get bulk time
shape: (11, 3)
| | psqlpy | psycopg |
| --- | --- | --- |
| str | f64 | f64 |
|-----|--------|---------|
| 1 | 0.001 | 0.001 |
| 2 | 0.001 | 0.001 |
| 3 | 0.001 | 0.001 |
| 4 | 0.001 | 0.001 |
| 5 | 0.001 | 0.001 |
| 6 | 0.001 | 0.001 |
| 7 | 0.001 | 0.001 |
| 8 | 0.001 | 0.001 |
| 9 | 0.001 | 0.001 |
| 10 | 0.001 | 0.001 |
| avg | 0.001 | 0.001 |
They are approximately the same. I'll make sure to explain why.
So, what I found strange? 1) In inserting new rows with psqlpy, you don't really use a transaction. https://github.com/BimaAdi/psqlpy-vs-psycopg/blob/main/bench_psqlpy.py#L29 This method is asynchronous 2) Using the old version of PostgreSQL. As I see, you are using the 12 version, I didn't perform any tests on the 12th version of PostgreSQL. So the question is why is so old?
The correct (for you test) PSQLPy bench function is:
async def bench() -> BenchTypedDict:
bench_time: BenchTypedDict = {"insert_bulk_time": 0.0, "get_bulk_time": 0.0}
try:
db_pool = (
ConnectionPoolBuilder()
.user(POSTGRESQL_USER)
.password(POSTGRESQL_PASSWORD)
.host(POSTGRESQL_HOST)
.port(POSTGRESQL_PORT)
.dbname(POSTGRESQL_DATABASE)
.build()
)
async with db_pool.acquire() as connection:
async with connection.transaction() as trans:
await create_table(trans)
start = time()
await insert_bulk(trans)
end = time() - start
bench_time["insert_bulk_time"] = end
start = time()
await get_bulk(trans)
end = time() - start
bench_time["get_bulk_time"] = end
await drop_table(trans)
finally:
db_pool.close()
The main explanation: PSQLPy aims to give you performance in real high-load applications at first, because of how rust (tokio runtime) manages asynchronous tasks. Let's imagine what is happening in the test you made, you always have 1 connection that is perfectly connected to the PostgreSQL, and all that drivers must do is request the database and get results. There is practically no window for optimization. On the other hand, when you have a lot of asynchronous requests to the database, it's important how the driver manages the connection pool and how a programming language manages the event loop tasks.
I've made test with the 12th version of PostgreSQL (from you docker-compose).
shape: (11, 3)
| | psqlpy | psycopg |
| --- | --- | --- |
| str | f64 | f64 |
|-----|--------|---------|
| 1 | 0.029 | 0.045 |
| 2 | 0.026 | 0.027 |
| 3 | 0.024 | 0.024 |
| 4 | 0.025 | 0.023 |
| 5 | 0.025 | 0.025 |
| 6 | 0.025 | 0.025 |
| 7 | 0.021 | 0.023 |
| 8 | 0.024 | 0.024 |
| 9 | 0.028 | 0.023 |
| 10 | 0.027 | 0.026 |
| avg | 0.025 | 0.027 |
get bulk time
shape: (11, 3)
| | psqlpy | psycopg |
| --- | --- | --- |
| str | f64 | f64 |
|-----|--------|---------|
| 1 | 0.0 | 0.001 |
| 2 | 0.0 | 0.0 |
| 3 | 0.0 | 0.0 |
| 4 | 0.0 | 0.0 |
| 5 | 0.0 | 0.0 |
| 6 | 0.0 | 0.0 |
| 7 | 0.0 | 0.0 |
| 8 | 0.0 | 0.0 |
| 9 | 0.0 | 0.0 |
| 10 | 0.0 | 0.0 |
| avg | 0.0 | 0.0 |
In inserting new rows with psqlpy, you don't really use a transaction. https://github.com/BimaAdi/psqlpy-vs-psycopg/blob/main/bench_psqlpy.py#L29 This method is asynchronous
I Want transaction only apllied during insert bulk only not on entire function
Using the old version of PostgreSQL. As I see, you are using the 12 version, I didn't perform any tests on the 12th version of PostgreSQL. So the question is why is so old?
It's the only version I had on my machine I will try on latest postgresql (16)
The main explanation: PSQLPy aims to give you performance in real high-load applications at first, because of how rust (tokio runtime) manages asynchronous tasks. Let's imagine what is happening in the test you made, you always have 1 connection that is perfectly connected to the PostgreSQL, and all that drivers must do is request the database and get results. There is practically no window for optimization. On the other hand, when you have a lot of asynchronous requests to the database, it's important how the driver manages the connection pool and how a programming language manages the event loop tasks.
Great explanation next time I will try benchmark it using web server like fastapi with high rps. Maybe it will get different result
I Want transaction only apllied during insert bulk only not on entire function
Okay, I thought you need transaction everywhere, because this is standart psycopg behavior: Psycopg has a behaviour that may seem surprising compared to psql: by default, any database operation will start a new transaction.
.
(https://www.psycopg.org/psycopg3/docs/basic/transactions.html). So, practically, I did exactly the same in PSQLPy but explicit.
Great explanation next time I will try benchmark it using web server like fastapi with high rps. Maybe it will get different result
You can try to use our psqlpy-stress to do it, just install bombardier and run application. We are going to show our different benchmarks this month. Spoiler: real performance gain is seen only when PostgreSQL is located not on localhost but somewhere else (another server, for example), and in my opinion, it's the most important, because it's really uncommon situation when application and database are located on the same machine and applicable only for small not high-load applications.
@BimaAdi If you dont mind, I can transfer the issue to discussions cuz it's not really an issue, more like discussion. If we'll find really downsides of PSQLPy, I'll be happy to improve performance.
@chandr-andr Ok, I don't mind is this issue become a discussion. I made this issue because there is 0 discussion on discussion tab and i read this issue https://github.com/qaspen-python/psqlpy/issues/43 about benchmark as well.
Yeap, I understand. Library is young (4.5 months), so there are no a lot of activity yet
I made my own benchmark between psqlpy vs psycopg you can found the comparison and code in this github repo https://github.com/BimaAdi/psqlpy-vs-psycopg. I made 2 use case
I found that psqlpy is slower than psycopg. Here the result (all in second):
Insert Bulk Time
Get Bulk Time
Machine spec
Although by only small margin but psycopg consitently beat psqlpy. What's wrong with my benchmark? did I do something wrong? or maybe due to other factor?