lugia19 / elevenlabslib

Full python wrapper for the elevenlabs API.
MIT License
150 stars 27 forks source link

Queue audio streams #17

Closed ChristianEvc closed 1 year ago

ChristianEvc commented 1 year ago

Hi,

First off, this library is excellet! Beats the official one by miles, frankly! =)

To the topic, which isn't as issue as such, but more of a feature request or clarification if such functionality exists.

I'm trying to write a chat bot using this library, but I want it to respond more quickly. As such, I'm trying to stream a response from GPT-4 sentence by sentence. When a new sentence comes in, I want to immediatly call the Elvenlabs API using your library. I'm doing that currently by calling the following each time a sentence is completed, with runInBackground set to False. voice.generate_stream_audio_v2(sentence,playbackOptions=playbackOptions,generationOptions=generationOptions)

Whilst this does work very well, it has a 2 second delay between each sentence. So I was thinking, is there a way to queue the audio streams instead, and then play them all in seqence? That way, while the first sentence audio is being played the second sentence is processed, and will seamlessly start playing immediately after the first.

Correct me if I'm wrong, but the current behaviour seems to wait for the first audio to finish playing before it sends the API call to Elevenlabs for the second sentence?

Setting the the runInBackground to True playes them all at the same time, roughly, but if I can make the same asyncronous calls to the API, but queue the responses in one audio thread, I think I can achieve the seamless audio playback.

Is this already possible, or is this something that could be built? I'd love to help, but am not nearly a good enough coder to know where to start on this.

Any info / help / suggestions you have to achieve this, please let me know!

Thanks!

lugia19 commented 1 year ago

It's something I've actually already implemented myself - I just haven't added it to the library since I find it to be more of a developer-side thing to implement.

You can find an example of how I did it (with extensive comments) here: https://github.com/lugia19/talkGPT-ToBeRenamed/blob/master/lowLatencyProofOfConcept.py

It essentially uses the onPlaybackStart and onPlaybackEnd parameters, combined with a queue of events and a separate thread, to queue up the generations in the way you're describing.

NOTE: That example still uses the old stream function - it would need to be swapped out with the new v2 one as the old one is deprecated.

ChristianEvc commented 1 year ago

Thank you very much!! And for the lightning fast response. I'll take a look.

lugia19 commented 10 months ago

A small update to this - I actually did end up implementing this in the library itself, in the form of the Synthesizer helper class.

https://elevenlabslib.readthedocs.io/en/latest/source/examples.html#use-the-synthesizer-utility-class-to-manage-playback