Closed ohdearquant closed 3 months ago
🤖
Hey @ohdearquant, great to cross paths with you again! Hope all has been well on your end.👋
Thank you for your feature request. I understand that you're looking for a way to control the rate limit for alternative models in LlamaIndex, similar to the functionality in lionagi
.
Currently, LlamaIndex does not have built-in support for rate limiting. However, it's possible to implement this feature using Python's asyncio library, which is already being used in LlamaIndex. Specifically, the asyncio.Semaphore
class can be used to limit the number of coroutines running simultaneously, effectively controlling the rate limit.
Here's a simplified example of how this could be implemented:
import asyncio
class RateLimiter:
def __init__(self, rate_limit):
self.semaphore = asyncio.Semaphore(rate_limit)
async def execute(self, job):
async with self.semaphore:
return await job
In this example, RateLimiter
is a class that takes a rate_limit
as an argument and uses it to create a semaphore. The execute
method is a coroutine that waits until it can acquire the semaphore before it starts running the job. This effectively limits the number of jobs that can run concurrently to the rate_limit
, thus controlling the rate limit.
This is a simplified example and the actual implementation would need to be integrated into the existing LlamaIndex codebase. However, it should give you a good starting point for how to implement this feature.
Please note that this is a significant change to the LlamaIndex codebase and would need to be thoroughly tested to ensure it doesn't introduce any regressions or performance issues. It would also need to be documented so that users are aware of the new feature and how to use it.
I hope this helps! If you have any further questions or need more clarification, please don't hesitate to ask.
Here is the official Python documentation for the asyncio.Semaphore
class for your reference.
Here is the relevant code from the LlamaIndex repository that uses the asyncio.Semaphore
class.
Here is the code from the LlamaIndex repository that could potentially be modified to include rate limiting.
Feature Description
currently the system doesn't have built-in support for controlling rate limit for alternative models, which might be a problem for users (especially in productions) wishing to run workflows over very large amount of input data.
for example, in
lionagi
, we can do the followingAnd the rate limit is now applied across all sessions. I am integrating
LlamaIndex
intolionagi
as a dependency, and this feature is essential.Reason
I think the demand is not too high yet
Value of Feature
This feature is invaluable for business users. The ability to control rate limit is crucial.