pinecone-io / canopy

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone
https://www.pinecone.io/
Apache License 2.0
948 stars 115 forks source link

[Feature] Any plan to support async and streaming? #233

Open hustxx opened 8 months ago

hustxx commented 8 months ago

Is this your first time submitting a feature request?

Describe the feature

We will need async and streaming to integrate canopy into our app, do you have any timeline when those will be supported?

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

miararoy commented 8 months ago

Yes, we have plans to add async routes, we are now planning for 2024Q1/2 so we should have better estimates soon

As for streaming, can you elaborate a bit more about it? What are you trying to achieve

usamasaleem1 commented 8 months ago

Same here, definitely need faster upsert methods. A bulk upload is taking forever.

miararoy commented 8 months ago

@usamasaleem1 @hustxx We agree. This is more complex than looks (the async route for upsert is actually - processing, chunking embedding and upserting) some are IO bounded, some CPU - which is always a challenge to combine

Having said that we will start working on this next week - we will update this issue as we progress 🙏

Evanrsl commented 7 months ago

Yes, we have plans to add async routes, we are now planning for 2024Q1/2 so we should have better estimates soon

As for streaming, can you elaborate a bit more about it? What are you trying to achieve

I think streaming chatbot feature is essential, I did it on my anyscale inference and now I'm trying to move it to canopy, but i couldn't find any documentation about it. as for my anyscale chatbot i follow this docs https://docs.endpoints.anyscale.com/examples/openai-chat-agent/

scottmx81 commented 7 months ago

@Evanrsl Canopy already supports streaming responses. It's fully built into the codebase already. It implements the same interface as the OpenAI chat completion API. If stream=True is passed in the create call, with Canopy as the base URL instead of the Anyscale one (as in your link), you'll still get a streaming response the exact same way.

Evanrsl commented 7 months ago

@scottmx81 Thanks for your explanations, what about the model parameter? just to make sure is my code correct?

client = openai.OpenAI(
    base_url="http://0.0.0.0:8000/v1/chat/completions",
)
chat_completion = client.chat.completions.create( 
           messages = "test message",
           model = ???
           stream = True
        )
scottmx81 commented 7 months ago

@Evanrsl the value for model depends on what LLM backend you are using in Canopy. If you are using the default OpenAI backend, then you'd have to read the OpenAI API reference to know what the valid values for model are. If you are using the Cohere backend, you'd read the Cohere docs to know what the valid model values are. Canopy passes the model value through to the underlying LLM, and the API for that LLM will determine which models you are allowed to use.