Open XChikuX opened 1 year ago
This is a pretty reasonable ask, my relatively naive thought (from my familiarity of other Redis clients) is that it would just pipeline any task not awaited. I have to admit though I'm not overly familiar with asyncio, but looking at the redis-py code underpinning all of this, it does not look like the physical write-command/read-response portion is re-entrant, so I'm not sure this is possible without some pretty significant refactoring.
As a work-around I think you should be able to leverage connection pooling and the raw database connection to increase concurrency.
@bsbodden or @tylerhutcherson - any thoughts by chance?
@slorello89 As far as my understanding goes; Asynchronous code does not await on a functional basis. You can functionalise a series of synchronous steps that can be awaited on, but that doesn't mean multiple awaits in a function will pipeline.
It is more of a line-by-line interpretted basis for python. It isn't possible to call an async task without awaiting it. Therefore, it will not auto pipeline. It must be explicitly supported by the database, by sending in all commands so database can execute (async or sync).
From the redis docs it is clear that the biggest bottleneck is socket I/O. Async programming does not help with this.
There's a good probability it makes it worse.
In python, each awaited call executes a network request to redis and awaits the result. During this waiting it switches context to other tasks and comes back to the result of this network request, which is an interrupt indicating that the program can now focus back onto this task.
Hi @XChikuX I'm pretty sure I'm right about this (as I said I would have thought there would be auto-pipelining, but that does not appear to be the case) rather, you can use a connection pool and scale out your reads/writes that way. Co-routines and tasks do not need to be awaited immediately (which is pretty normal in async programming), you can gather them together and await them in bulk.
you can use a conneciton pool to mitigate at least some of this in the interim (I agree this isn't ideal):
from aredis_om import (
JsonModel,
Migrator,
Field
)
import time
import asyncio
from redis.asyncio.client import Redis, ConnectionPool
pool = ConnectionPool(host='localhost', port=6379, db=0)
class TestModel(JsonModel):
name: str = Field(index=True)
class Meta:
database = Redis(connection_pool=pool)
async def await_each_save(num: int):
for i in range(0,1000):
tm_coroutine = TestModel(name="steve").save()
await tm_coroutine
async def await_gather_save(num: int):
coroutines = []
for i in range(0, num):
tm_coroutine = TestModel(name="steve").save()
coroutines.append(tm_coroutine)
if len(coroutines) == 50:
await asyncio.gather(*coroutines)
coroutines.clear()
async def await_each_search(num: int):
for i in range(0, num):
await TestModel.find(TestModel.name == 'steve').first()
async def await_gather_search(num: int):
coroutines = []
for i in range(0, num):
coroutines.append(TestModel.find(TestModel.name == 'steve').first())
if len(coroutines) == 50:
await asyncio.gather(*coroutines)
coroutines.clear()
async def start():
await Migrator().run()
# await each save
start_time = time.time()
await await_each_save(1000)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time:", execution_time, "seconds")
# await gather save
start_time = time.time()
await await_gather_save(1000)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time:", execution_time, "seconds")
# await each search
start_time = time.time()
await await_each_search(1000)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time:", execution_time, "seconds")
# await gather search
start_time = time.time()
await await_gather_search(1000)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time:", execution_time, "seconds")
benchmarking the above code results in significantly better performance when gathering the coroutines together and awaiting them en-masse (which makes sense because you're scaling your writes out over a number of connection.
Should definitely look into some way of properly pipelining , but I fear it might be quite difficult considering the underpinning client's architecture.
I'll keep this in mind for now. Thank you.
Currently there is implementation details for saving | deleting multiple records at once using redis pipelines. Example:
I want to be able to get multiple records at once, so as to minimize network overhead. Please make it possible
eg. query_members=[pk1, pk2]
result = await m.Member.get_many(query_members)
So something like the delete_many function: