Closed MiWeiss closed 1 year ago
Nice ignore, ggwp
Hi @MiWeiss! Thanks for the issue.
Adding async support is on the roadmap, though we aren't committing to a timeline for when it'll be released. While I generally agree that an async interface would be good, can you tell me a little more about the performance improvement you'd expect to see if we added it? That'll help give us a sense about how to prioritize it.
Hi @hallacy
Thanks for your answer.
I am not sure how to understand your question (e.g., are you asking about the technical reasons or use cases?), so please excuse if my answer misses its target...
Quick motivation: Async requests makes it very easy to perform other tasks while waiting for the response to a request to OpenAI. If the "other tasks" are also io/network bound (e.g. more requests to OpenAI 😄), I'm likely also running them in an async way, such that I can combine the waiting time (as such, the time I wait is approx. equal to the time the slowest task takes). This is naturally much faster than doing all the waiting sequentially.
And yes, much of that could be done using threads, but there are various disadvantages (especially for i/o bound operations) to using threading over asyncio. See e.g. this great comment.
See also these fastapi docs providing a detailed, yet simple and intuitive motivation for using async requests.
Does this answer your question?
Also, IMHO It may be nontrivial to change the library such that both async and synchronous requests are supported, both from an implementation perspective (session handling, api design, etc.) and regarding documentation (every snippet can be async or sync)? It might be easier to offer async-openai as a standalone library. That's just my five cents and I am happy to be proven wrong, though.
It does! Thank you for the writeup. That comment you linked to was particularly helpful.
I think I agree with that opinion about non-triviality. I can't commit to a timeline, but I'll make a point of bringing this up to the team soon
This is badly needed! Having to make concurrent requests using threading is not good for modern python. And it's honestly better to start on it earlier, because the whole library will need to be upgraded. Or of course can make a new library for it. The nodejs openai library is naturally async which is a big advantage. In the meantime, asyncifying function calls with a threadpool seems to work well for me https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor.
Happy to commit to this. Based on previous experience with Redis and Django, a standalone package isn't needed, and this package would only need to include aiohttp as a dependency. Elasticsearch has async dependencies as optional, but in my opinion it doesn't need to be for this repository. Should be done tomorrow. (In the meantime, you can use asgiref and use sync_to_async)
a
prefix, used in CPython lib, Django, and other libs):import openai
openai.api_key = "sk-..."
async def main():
await openai.Completion.acreate(prompt="This is a test", engine="text-ada-001")
In the meantime, you can use asgiref
(notice the lack of the a
prefix):
import openai
from asgiref.sync import sync_to_async
openai.api_key = "sk-..."
async def main():
await sync_to_async(openai.Completion.create)(prompt="Test is a test", engine="text-ada-001")
in the meantime, you can also use this light-weight client i wrote (it's using httpx) https://pypi.org/project/openai-async/
something like:
pip install openai-async
and then:
import openai_async
response = await openai_async.complete(
"<API KEY>",
timeout=2,
payload={
"model": "text-davinci-003",
"prompt": "Correct this sentence: Me like you.",
"temperature": 0.7,
},
)
print(response.json()["choices"][0]["text"].strip())
>>> "I like you."
@itayzit @MiWeiss any luck with this on a high concurrency implementation? i'm trying this but not getting the rates i'm hoping for.
@danbf Recommend you also manually control the aiohttp session:
import openai
from aiohttp import ClientSession
openai.aiosession.set(ClientSession())
# At the end of your program, close the http session
await openai.aiosession.get().close()
thanks @Andrew-Chen-Wang and @MiWeiss going to use that code hint and yup, was aware of the OpenAI rate limit
.
@Andrew-Chen-Wang @MiWeiss any ideas what to set TCPConnector(limit=XXX)
to maximize throughput?
You could try setting it to 0 for no limit. But honestly I very much doubt you can query openai with much more than 100 connections and not hit one of their quota limits.
It can be done also using asyncer
import openai
openai.api_key = settings.OPENAI_API_KEY
response = await asyncify(openai.ChatCompletion.create)(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful help desk assistant."},
{"role": "user", "content": "Which is the capital of Ecuador?"},
{
"role": "assistant",
"content": "The capital of Ecuador is Quito.",
},
{"role": "user", "content": "Responde lo mismo pero en español"},
],
)
print("response", response)
@Andrew-Chen-Wang -- would there be any disadvantage of using your PR over asyncer
or vice-versa?
I would go with whatever openai-python has (ie my PR) since there are constant improvements to the repo, it's the official one, and all the examples online utilize this package. Performance is negligible for this repo since latency is the largest performance hindrance. It also seems easier to just import once (openai) and not twice (with asyncify)
Exposing async interfaces would allow using this library in a much more modern, performant, and scalable way.
Would be great if the maintainers could mention if they plan to add async methods in the future (i.e., allow for nonblocking api usage). Even specifying explicitly that this won't be added would be great, as it allows 3rd parties to release their own fork or wrapper, without the risk of being obsolete just moments later :-)