psqlpy-python / psqlpy

Asynchronous Python PostgreSQL driver written in Rust
https://psqlpy-python.github.io/
MIT License
211 stars 3 forks source link

psqlpy slower than psycopg #62

Closed BimaAdi closed 2 months ago

BimaAdi commented 2 months ago

I made my own benchmark between psqlpy vs psycopg you can found the comparison and code in this github repo https://github.com/BimaAdi/psqlpy-vs-psycopg. I made 2 use case

  1. Insert bulk (insert many data in one transaction)
  2. Get many (get many data from database)

I found that psqlpy is slower than psycopg. Here the result (all in second):

Insert Bulk Time

psqlpy psycopg
1 0.271 0.096
2 0.269 0.184
3 0.319 0.216
4 0.38 0.097
5 0.378 0.117
6 0.288 0.139
7 0.287 0.159
8 0.267 0.101
9 0.285 0.107
10 0.448 0.181
avg 0.319 0.14

Get Bulk Time

psqlpy psycopg
1 0.002 0.002
2 0.005 0.003
3 0.004 0.004
4 0.002 0.002
5 0.005 0.002
6 0.002 0.002
7 0.002 0.003
8 0.003 0.002
9 0.003 0.002
10 0.003 0.002
avg 0.003 0.002

Machine spec

Although by only small margin but psycopg consitently beat psqlpy. What's wrong with my benchmark? did I do something wrong? or maybe due to other factor?

chandr-andr commented 2 months ago

Hello! Thank you for you benchmarks. But, I've found some strange moments.

First of all, let's see my results:

insert bulk time
shape: (11, 3)
|     | psqlpy | psycopg |
| --- | ---    | ---     |
| str | f64    | f64     |
|-----|--------|---------|
| 1   | 0.027  | 0.034   |
| 2   | 0.025  | 0.024   |
| 3   | 0.024  | 0.023   |
| 4   | 0.022  | 0.021   |
| 5   | 0.022  | 0.021   |
| 6   | 0.022  | 0.021   |
| 7   | 0.023  | 0.02    |
| 8   | 0.022  | 0.021   |
| 9   | 0.023  | 0.021   |
| 10  | 0.024  | 0.021   |
| avg | 0.023  | 0.023   |
get bulk time
shape: (11, 3)
|     | psqlpy | psycopg |
| --- | ---    | ---     |
| str | f64    | f64     |
|-----|--------|---------|
| 1   | 0.001  | 0.001   |
| 2   | 0.001  | 0.001   |
| 3   | 0.001  | 0.001   |
| 4   | 0.001  | 0.001   |
| 5   | 0.001  | 0.001   |
| 6   | 0.001  | 0.001   |
| 7   | 0.001  | 0.001   |
| 8   | 0.001  | 0.001   |
| 9   | 0.001  | 0.001   |
| 10  | 0.001  | 0.001   |
| avg | 0.001  | 0.001   |

They are approximately the same. I'll make sure to explain why.

So, what I found strange? 1) In inserting new rows with psqlpy, you don't really use a transaction. https://github.com/BimaAdi/psqlpy-vs-psycopg/blob/main/bench_psqlpy.py#L29 This method is asynchronous 2) Using the old version of PostgreSQL. As I see, you are using the 12 version, I didn't perform any tests on the 12th version of PostgreSQL. So the question is why is so old?

The correct (for you test) PSQLPy bench function is:

async def bench() -> BenchTypedDict:
    bench_time: BenchTypedDict = {"insert_bulk_time": 0.0, "get_bulk_time": 0.0}
    try:
        db_pool = (
            ConnectionPoolBuilder()
            .user(POSTGRESQL_USER)
            .password(POSTGRESQL_PASSWORD)
            .host(POSTGRESQL_HOST)
            .port(POSTGRESQL_PORT)
            .dbname(POSTGRESQL_DATABASE)
            .build()
        )

        async with db_pool.acquire() as connection:
            async with connection.transaction() as trans:
                await create_table(trans)

                start = time()
                await insert_bulk(trans)
                end = time() - start
                bench_time["insert_bulk_time"] = end

                start = time()

                await get_bulk(trans)
                end = time() - start
                bench_time["get_bulk_time"] = end

                await drop_table(trans)
    finally:
        db_pool.close()

The main explanation: PSQLPy aims to give you performance in real high-load applications at first, because of how rust (tokio runtime) manages asynchronous tasks. Let's imagine what is happening in the test you made, you always have 1 connection that is perfectly connected to the PostgreSQL, and all that drivers must do is request the database and get results. There is practically no window for optimization. On the other hand, when you have a lot of asynchronous requests to the database, it's important how the driver manages the connection pool and how a programming language manages the event loop tasks.

chandr-andr commented 2 months ago

I've made test with the 12th version of PostgreSQL (from you docker-compose).

shape: (11, 3)
|     | psqlpy | psycopg |
| --- | ---    | ---     |
| str | f64    | f64     |
|-----|--------|---------|
| 1   | 0.029  | 0.045   |
| 2   | 0.026  | 0.027   |
| 3   | 0.024  | 0.024   |
| 4   | 0.025  | 0.023   |
| 5   | 0.025  | 0.025   |
| 6   | 0.025  | 0.025   |
| 7   | 0.021  | 0.023   |
| 8   | 0.024  | 0.024   |
| 9   | 0.028  | 0.023   |
| 10  | 0.027  | 0.026   |
| avg | 0.025  | 0.027   |
get bulk time
shape: (11, 3)
|     | psqlpy | psycopg |
| --- | ---    | ---     |
| str | f64    | f64     |
|-----|--------|---------|
| 1   | 0.0    | 0.001   |
| 2   | 0.0    | 0.0     |
| 3   | 0.0    | 0.0     |
| 4   | 0.0    | 0.0     |
| 5   | 0.0    | 0.0     |
| 6   | 0.0    | 0.0     |
| 7   | 0.0    | 0.0     |
| 8   | 0.0    | 0.0     |
| 9   | 0.0    | 0.0     |
| 10  | 0.0    | 0.0     |
| avg | 0.0    | 0.0     |
BimaAdi commented 2 months ago

In inserting new rows with psqlpy, you don't really use a transaction. https://github.com/BimaAdi/psqlpy-vs-psycopg/blob/main/bench_psqlpy.py#L29 This method is asynchronous

I Want transaction only apllied during insert bulk only not on entire function

Using the old version of PostgreSQL. As I see, you are using the 12 version, I didn't perform any tests on the 12th version of PostgreSQL. So the question is why is so old?

It's the only version I had on my machine I will try on latest postgresql (16)

The main explanation: PSQLPy aims to give you performance in real high-load applications at first, because of how rust (tokio runtime) manages asynchronous tasks. Let's imagine what is happening in the test you made, you always have 1 connection that is perfectly connected to the PostgreSQL, and all that drivers must do is request the database and get results. There is practically no window for optimization. On the other hand, when you have a lot of asynchronous requests to the database, it's important how the driver manages the connection pool and how a programming language manages the event loop tasks.

Great explanation next time I will try benchmark it using web server like fastapi with high rps. Maybe it will get different result

chandr-andr commented 2 months ago

I Want transaction only apllied during insert bulk only not on entire function

Okay, I thought you need transaction everywhere, because this is standart psycopg behavior: Psycopg has a behaviour that may seem surprising compared to psql: by default, any database operation will start a new transaction.. (https://www.psycopg.org/psycopg3/docs/basic/transactions.html). So, practically, I did exactly the same in PSQLPy but explicit.

Great explanation next time I will try benchmark it using web server like fastapi with high rps. Maybe it will get different result

You can try to use our psqlpy-stress to do it, just install bombardier and run application. We are going to show our different benchmarks this month. Spoiler: real performance gain is seen only when PostgreSQL is located not on localhost but somewhere else (another server, for example), and in my opinion, it's the most important, because it's really uncommon situation when application and database are located on the same machine and applicable only for small not high-load applications.

chandr-andr commented 2 months ago

@BimaAdi If you dont mind, I can transfer the issue to discussions cuz it's not really an issue, more like discussion. If we'll find really downsides of PSQLPy, I'll be happy to improve performance.

BimaAdi commented 2 months ago

@chandr-andr Ok, I don't mind is this issue become a discussion. I made this issue because there is 0 discussion on discussion tab and i read this issue https://github.com/qaspen-python/psqlpy/issues/43 about benchmark as well.

chandr-andr commented 2 months ago

Yeap, I understand. Library is young (4.5 months), so there are no a lot of activity yet