jekalmin / extended_openai_conversation

Home Assistant custom component of conversation agent. It uses OpenAI to control your devices.
906 stars 126 forks source link

Streaming with chatgpt (new feature) #58

Open RASPIAUDIO opened 8 months ago

RASPIAUDIO commented 8 months ago

I am using gpt4 in HA with the Raspiaudio Luxe speaker and long answers need 20-30s before behing played. Could it be possible to implement the streaming option of the OpenAi API? It is what missing to compete with the others commercial voice assistant.

But then I guess it will request too a modification of the TTS plugin too.

jekalmin commented 8 months ago

Thanks for a suggestion!

In order to achieve streaming response, everything from end to end should support streaming. Although OpenAI supports stream option, as far as I know, IntentResponse of HA need to support streaming as well, which is not supported.

Please correct me if anything should be fixed.

RASPIAUDIO commented 8 months ago

Perhaps one way to do it without any major change could be to split response by sentences (detection of '.' followed by a capital), each sentence could be sent one after the other in the IntentResponse. And in the prompt it should say to answer with short sentences.

Could it be done on your side?

it could really change the user experience

jekalmin commented 8 months ago

Although it's possible to split into multiple sentences, since only one IntentResponse is used per conversation, I can't send multiple IntentResponse to the next assist pipeline.

bblaha commented 8 months ago

I have been looking for a way to speed up the reply times as well and stumbled over this issue. I do not have a solution, but would like to contribute with some brainstorming: 1) I do not think the average response time to actually do any action is problematic, so without knowing the ins and outs of HomeAssistant, the actual intent recognition could stay where it is. 2) What feels too slow is the time until the reply starts. Just imagining (not knowing) how HomeAssistant is built, I assume Extended OpenAI Conversation could not just do voice/text output repeatedly before listening again. I also assume that triggering TTS form outside of Wyoming would yield any result.

So would a request for HomeAssistant to allow for multiple TTS outputs before listening again be helpful?

RASPIAUDIO commented 7 months ago

If you're using GPT-4 and request the assistant to create a lengthy bedtime story for your children, you might experience a wait time of nearly a minute. This is because the system waits to complete the entire story before generating the text-to-speech (TTS). While this is a niche use case, implementing a streaming feature could save a few seconds per query. Multiplied by the number of daily questions and the user base, this improvement could thus save many human lives in terms of time saved. :)

Maybe the first step is to ask streaming feature for voice assistant.

https://community.home-assistant.io/t/streaming-feature-for-voice-assistant/678923

emanuelbaltaretu commented 6 months ago

I am encountering the same issue regarding response time in my use case as TTS ways until all the text has finished. A stream would make the whole experience seamless. @RASPIAUDIO, @bblaha since the time of your reply, any luck?