Open chandr-andr opened 6 months ago
@chandr-andr Thank you for your interest in Piccolo. Your work with psqlpy
and qaspen
(I didn't go into details and had a quick look at the source code and I like that it has a lot of similarities with Piccolo :) ) is also great.
I have nothing against the alternative driver, but I have one question regarding the performance that you stated have a significant performance upgrade in comparison to asyncpg
.
Apologies in advance if I'm doing something wrong as I'm not a benchmark expert and I'm using the http bencmark test as the ORM or driver will most likely be used for a web application. I've tried the FastAPI example from the docs and built a similar example with asyncpg like this
# users table only have 2 columns (id and name)
import asyncpg
import uvicorn
from contextlib import asynccontextmanager
from typing import AsyncGenerator
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
"""Startup database connection pool and close it on shutdown."""
db_pool = await asyncpg.create_pool(
dsn="postgres://postgres:postgres@localhost:5432/psqlpydb",
max_size=10,
)
app.state.db_pool = db_pool
yield
await db_pool.close()
app = FastAPI(lifespan=lifespan)
@app.get("/asyncpg")
async def pg_pool_example(request: Request):
query_result = await request.app.state.db_pool.fetch(
"SELECT * FROM users",
)
return JSONResponse(content=[dict(item) for item in query_result])
if __name__ == "__main__":
uvicorn.run("app:app" )
and the http benchmark results are pretty similar and asyncpg does a bit better? Here are the results.
rkl@mint21:~$ bombardier -c 500 -d 10s -l http://localhost:8000/asyncpg
Bombarding http://localhost:8000/asyncpg for 10s using 500 connection(s)
[============================================================================================]10s
Done!
Statistics Avg Stdev Max
Reqs/sec 707.47 109.03 1021.59
Latency 678.21ms 70.37ms 1.54s
Latency Distribution
50% 687.24ms
75% 698.83ms
90% 709.14ms
95% 718.07ms
99% 0.88s
HTTP codes:
1xx - 0, 2xx - 7556, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 218.29KB/s
rkl@mint21:~$ bombardier -c 500 -d 10s -l http://localhost:8000/psqlpy
Bombarding http://localhost:8000/psqlpy for 10s using 500 connection(s)
[============================================================================================] 10s
Done!
Statistics Avg Stdev Max
Reqs/sec 639.18 90.12 975.61
Latency 745.92ms 72.92ms 1.08s
Latency Distribution
50% 755.51ms
75% 762.43ms
90% 784.68ms
95% 797.32ms
99% 0.97s
HTTP codes:
1xx - 0, 2xx - 6860, 3xx - 0, 4xx - 0, 5xx - 0
others - 0
Throughput: 197.62KB/s
Sorry again if I missed the point about driver performance.
@sinisaos Thanks for the quick feedback! I found your benchmarks interesting and decided to find out what's going on. So, in my example in the docs, I set only 2 connections for the connection pool, so if you used it without any change it has to lose to the asyncpg.
I created the FastAPI application (the same as yours) but set 10 connections for the connection pool in psqlpy. The results are above: PSQLPy
> ./bombardier -c 500 -d 10s -l http://127.0.0.1:8000/psqlpy
Bombarding http://127.0.0.1:8000/psqlpy for 10s using 500 connection(s)
[================================================================================================================================================================================================================] 10s
Done!
Statistics Avg Stdev Max
Reqs/sec 4487.78 575.16 6509.45
Latency 110.76ms 8.40ms 272.14ms
Latency Distribution
50% 109.78ms
75% 113.55ms
90% 117.31ms
95% 121.84ms
99% 146.59ms
HTTP codes:
1xx - 0, 2xx - 45289, 3xx - 0, 4xx - 0, 5xx - 0
others - 3
Errors:
dial tcp 127.0.0.1:8000: connect: connection reset by peer - 3
Throughput: 1.14MB/s
And AsyncPG
> ./bombardier -c 500 -d 10s -l http://127.0.0.1:8000/asyncpg
Bombarding http://127.0.0.1:8000/asyncpg for 10s using 500 connection(s)
[================================================================================================================================================================================================================] 10s
Done!
Statistics Avg Stdev Max
Reqs/sec 4465.72 362.02 6188.94
Latency 111.26ms 108.25ms 1.31s
Latency Distribution
50% 108.62ms
75% 124.19ms
90% 314.84ms
95% 336.71ms
99% 549.38ms
HTTP codes:
1xx - 0, 2xx - 45064, 3xx - 0, 4xx - 0, 5xx - 0
others - 17
Errors:
dial tcp 127.0.0.1:8000: connect: connection reset by peer - 17
Throughput: 1.14MB/s
In this test, PSQLPy beats asyncpg. The difference, of course, is not so significant, but the query is as simple as possible too. PSQLPy can give significant improvement with high-load big queries with a lot of output data.
In this example, there are not a lot of places for speeding up.
BTW, one guy made a database performance repo - https://github.com/ymezencev/db_perf with PSQLPy already.
BTW, one guy made a database performance repo - https://github.com/ymezencev/db_perf with PSQLPy already.
@chandr-andr Thank you for your reply. According to the tests from that repo, the difference is significant for larger amounts of data, as you mentioned before, although I doubt anyone will load 50.000 rows from a single web api call without some sort of pagination. For example, PiccoloCRUD (which is used by Piccolo Admin), has a max_page_size
parameter that limits the maximum number of rows returned per single API call. I have nothing against an alternative PostgreSQL engine (another option is always good), but you'll have to wait for the Piccolo ORM author to say what he thinks about all this. Cheers.
It looks like a cool project.
Since we've only supported asyncpg up until now, I'm not sure how much work is involved in adding a new Postgres engine.
If we're lucky, it's just a case of creating a new Engine
subclass, with no other changes to Piccolo, in which case it could even be a third party library at first.
@dantownsend Hello! Thank you for the positive assessment of the PSQLPy. I can do MVP engine based on the driver myself, I've had some experience with piccolo at work, so have knowledge about the internals. I think I can do it in a new repo near PSQLPy.
@chandr-andr That would be great, thanks!
Hello everyone!
I'm one of the developers behind psqlpy. We've created a third-party library, which you can check out here. However, we've encountered a small problem that makes the integration nearly impossible without your assistance.
The issue lies within the _process_results
method, where some unwrapping is performed. Asyncpg returns a Record object, but psqlpy does not. Instead, it behaves more like SQLite in this context, returning a list of dictionaries. Therefore, we don't need this unwrapping. Unfortunately, there is no appropriate engine type for our use case. We cannot use the SQLite engine type for obvious reasons.
Would it be possible to add a psqlpy-postgres
type, so we can eliminate this Record object unwrapping?
Additionally, this change needs to be reflected in the Query class within the querystrings
property. There might be other places affected, but these are what I identified during my reverse engineering.
Looking forward to our collaboration!
@insani7y Interesting point - the logic to standardise responses as lists of dicts should go into the engine itself, rather than in the query. I'll have a look.
@insani7y If you try piccolo==1.14.0
hopefully this issue is fixed.
Hello, @dantownsend, thanks for your help, everything is working just fine! But there is another problem with nodes.
There is this code inside Query._run
if node is not None:
from piccolo.engine.postgres import PostgresEngine
if isinstance(engine, PostgresEngine):
engine = engine.extra_nodes[node]
But psqlpy is also about postgres. Let's add some is_nodes_available
attribute/method to the engine, so this functionality would not depend on the Query._run
realization, but on the engine itself.
What do you think?
@insani7y Yes, good point.
I had a look and there are a couple of other places which use isinstance(engine, PostgresEngine)
, which also need tweaking.
@dantownsend would you like to fix it yourself? I can open an issue and fix it myself, actually.
@insani7y I wonder whether instead of having an is_nodes_available
property, we just move extra_nodes
to the base Engine
class.
It's unlikely anybody would use it for SQLite, but not impossible.
Or for SQLite, we just remove it from SQLiteEngine.__init__
and pass an empty list to super().__init__
.
What do you think?
@insani7y Also, one thought - is there any benefit in creating something like BasePosgresEngine
that all Postgres engines inherit from?
Or what if your own custom engine inherited from PostgresEngine
? Do you think that's too messy as there's so much asyncpg
specific logic in there?
@dantownsend I believe there's limited benefit in using BasePostgresEngine
because the various drivers like asyncpg
, psqlpy
, and psycopg
have significant internal differences. Adapting a single method to accommodate all these drivers would be extremely challenging and would essentially require a complete rewrite. Additionally, other components such as AsyncBatch
, Savepoint
, Transaction
, e.t.c. differ considerably among these drivers.
Inheritance presents similar issues, offering minimal advantage in this context imo.
I prefer the concept of EngineProtocol
or some EngineABC
that includes abstract methods only. With this approach, Piccolo would expect an interface rather than a specific implementation. For a protocol-based approach, moving extra_nodes to the engine class seems more appropriate.
If SQLite does not require this attribute, allowing engines to accept different configurations could be acceptable. In such a scenario, the configuration could be represented by a dataclass that the engine depends on via typing.Generic. However, this might be overkill. We could simply ignore those nodes and issue a warning, but it would be better for the user if this option was completely removed from the engine config.
@insani7y @chandr-andr I've tried your library and I think it's great. Maybe I should have created an issue in your repo and if needed I can do that. I found a few problems with the columns. If asyncpg
is used as the driver, there are no such problems. Here it is and I hope it is well represented.
Interval
column:
Object save example :
data = MegaTable(interval_col=datetime.timedelta(seconds=10))
await data.save()
Error:
psqlpy.exceptions.PyToRustValueMappingError: Can't convert value from python to rust type: Can not covert you type 0:00:10 into inner one
JSONB
column:
Object save example :
data = MegaTable(jsonb_col={"a": 1})
await data.save()
Error:
psqlpy.exceptions.ConnectionExecuteError: Connection execute error: Cannot execute statement, error - db error: ERROR: unsupported jsonb version number 123.
BigInt
column:
Error:
psqlpy.exceptions.ConnectionExecuteError: Connection execute error: Cannot execute statement, error - db error: ERROR: insufficient data left in message.
SmallInt
column:
Error:
psqlpy.exceptions.ConnectionExecuteError: Connection execute error: Cannot execute statement, error - db error: ERROR: incorrect binary data format in bind parameter 6.
DoublePrecision
column:
Error:
psqlpy.exceptions.ConnectionExecuteError: Connection execute error: Cannot execute statement, error - db error: ERROR: insufficient data left in message.
Error trace is same for all column errors:
File "/home/rkl/dev/env/lib/python3.11/site-packages/piccolo/query/base.py", line 189, in run
return await self._run(node=node, in_pool=in_pool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rkl/dev/env/lib/python3.11/site-packages/piccolo/query/base.py", line 171, in _run
results = await engine.run_querystring(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rkl/dev/env/lib/python3.11/site-packages/psqlpy_piccolo/engine.py", line 668, in run_querystring
response = await self._run_in_pool(query, query_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rkl/dev/env/lib/python3.11/site-packages/psqlpy_piccolo/engine.py", line 609, in _run_in_pool
response = await connection.execute(
^^^^^^^^^^^^^^^^^^^^^^^^^
Sorry for the long post and sorry if I'm doing something wrong. I hope you find it useful.
@sinisaos Thanks for identifying these issues. We'll look into them and get back to you once everything is fixed!
@insani7y Great. Thanks. You probably know that the rest of the columns errors come from object creating like this
data = MegaTable(
bigint_col=100000000,
double_precision_col=1.23,
smallint_col=1,
)
await data.save()
I forgot to write that in the previous comment. Sorry.
Hello! Thank you for your awesome ORM.
Recently, I've created a new driver for PostgreSQL - https://github.com/qaspen-python/psqlpy. It shows a significant performance upgrade in comparison to asyncpg. It has already been tested in some production services and I can say it is production-ready.
Do you mind if I create a big PR that adds a new engine based on PSQLPy?