Open durandom opened 2 weeks ago
If I add a
await task.queue_frame(LLMFullResponseEndFrame())
after the TextFrame
, then it works 🤷
I am also getting the issue this is my sample code
try:
async with aiohttp.ClientSession() as session:
(room_url, token) = await configure(session)
logger.debug(f"Getting the room and token from the session {room_url} and {token}")
transport = DailyTransport(
room_url, token, "ChatBot", DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
transcription_enabled=True,
vad_audio_passthrough=True
)
)
tts = DeepgramSTTService(
api_key=config("DEEPGRAM_API_KEY"),
live_options=LiveOptions(
encoding="linear16",
model="nova-2-conversationalai",
sample_rate=16000,
channels=1,
interim_results=True,
smart_format=True,
punctuate=True,
profanity_filter=True,
vad_events=True,
)
)
llm = OpenAILLMService(api_key=config("OPENAI_API_KEY"), model="gpt-4o")
messages = [
{
"role": "system",
"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself."
}
]
user_response = LLMUserResponseAggregator()
assistant_response = LLMAssistantResponseAggregator()
pipeline = Pipeline(
[
transport.input(),
user_response,
llm,
tts,
transport.output(),
assistant_response
]
)
task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(trans, participant):
logging.info(f"a participant joined {participant}")
logging.info(f"we are getting the response {trans}")
transport.capture_participant_transcription(participant["id"])
await task.queue_frames([LLMMessagesFrame(messages),])
@transport.event_handler("on_participant_left")
async def on_participant_left(trans, participant, reason):
print(f"Participant left: {participant}")
logging.info(f"results on leaving the info {trans}")
await task.queue_frame(EndFrame())
runner = PipelineRunner()
await runner.run(task)
except Exception as e:
import traceback
logger.error(f"An error occurred: {str(e)}")
logger.error(f"we found an issue {traceback.format_exc()}")
if __name__ == '__main__':
logging.info("we are running")
# execute only if run as the entry point into the program
asyncio.run(main())
The idea was to just get it to respond back, but it never gives a response, it only connects via daily I can see the chatbot and the details of the chatbot but never a response. Deepgram however shows that my credits have been spent, yesterday it went from 0 - 10$ in one conversation. So not sure what is taking so much
If I add a
await task.queue_frame(LLMFullResponseEndFrame())
after the
TextFrame
, then it works 🤷
@durandom did you get it to work, really need to complete this part of the app
Yes, queueing the LLMFullResponseEndFrame
worked for me. See https://github.com/b4mad/mds-moderator/blob/6f4b37453a6e978f4578feec2c28c714430937a1/participant.py#L85
Same issue, had to replace EndFrame
with LLMFullResponseEndFrame
Seems you can also just remove the end frame
I replaced the
CartesiaTTSService
withElevenLabsTTSService
in the https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/01-say-one-thing.py example, but that doesnt work anymore with 0.0.43
The reason is because CartesiaHttpTTSService
blocks on the HTTP request to get the audio and no other frames will be pushed before the generated audio frames. That is, you will get a bunch of audio frames and then the EndFrame
which will make things close properly and end the application.
If we replace CartesiaHttpTTSService
with something that works asynchronously like ElevenLabsTTSService
, adding an EndFrame
will cause the app to stop right away. That's because we have no idea when ElevenLabs will give us audio or when the audio will end.
So for this specific use case, you really need to use a TTS service that uses HTTP.
In normal applications you would probably send an EndFrame()
when the user disconnects for example.
I am also getting the issue this is my sample code
....... tts = DeepgramSTTService( api_key=config("DEEPGRAM_API_KEY"), live_options=LiveOptions( encoding="linear16", model="nova-2-conversationalai", sample_rate=16000, channels=1, interim_results=True, smart_format=True, punctuate=True, profanity_filter=True, vad_events=True, ) ) .......
The idea was to just get it to respond back, but it never gives a response, it only connects via daily I can see the chatbot and the details of the chatbot but never a response. Deepgram however shows that my credits have been spent, yesterday it went from 0 - 10$ in one conversation. So not sure what is taking so much
The issue in this code is that you are using DeepgramSTTService
instead of DeepgramTTSService
. Your variable is named properly tts
but you are using the wrong service.
So, to recap, a couple of issues discussed in here:
EndFrame
doesn't really work well in the 01-say-one-thing.py
example because of what I explained here https://github.com/pipecat-ai/pipecat/issues/570#issuecomment-2421012277DeepgramSTTService
being used as TTS service instead of DeepgramTTSService
.LLMFullResponseEndFrame
should be really needed. Those are kind of internal frames and are generated by the LLM service. But maybe there's an issue... :thinking: This PR changes the first examples a bit so they don't send EndFrame
right away but when the user leaves. https://github.com/pipecat-ai/pipecat/pull/613
I replaced the
CartesiaTTSService
withElevenLabsTTSService
in the https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/01-say-one-thing.py example, but that doesnt work anymore with 0.0.43The reason is because
CartesiaHttpTTSService
blocks on the HTTP request to get the audio and no other frames will be pushed before the generated audio frames. That is, you will get a bunch of audio frames and then theEndFrame
which will make things close properly and end the application.If we replace
CartesiaHttpTTSService
with something that works asynchronously likeElevenLabsTTSService
, adding anEndFrame
will cause the app to stop right away. That's because we have no idea when ElevenLabs will give us audio or when the audio will end.So for this specific use case, you really need to use a TTS service that uses HTTP.
In normal applications you would probably send an
EndFrame()
when the user disconnects for example.
This should be made really clear in the docs
I replaced the
CartesiaTTSService
withElevenLabsTTSService
in the https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/01-say-one-thing.py example, but that doesnt work anymore with 0.0.43The reason is because
CartesiaHttpTTSService
blocks on the HTTP request to get the audio and no other frames will be pushed before the generated audio frames. That is, you will get a bunch of audio frames and then theEndFrame
which will make things close properly and end the application. If we replaceCartesiaHttpTTSService
with something that works asynchronously likeElevenLabsTTSService
, adding anEndFrame
will cause the app to stop right away. That's because we have no idea when ElevenLabs will give us audio or when the audio will end. So for this specific use case, you really need to use a TTS service that uses HTTP. In normal applications you would probably send anEndFrame()
when the user disconnects for example.This should be made really clear in the docs
Totally agree. 😞 We'll get there! 💪
I replaced the
CartesiaTTSService
withElevenLabsTTSService
in the https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/01-say-one-thing.py example, but that doesnt work anymore with 0.0.43Here's some logging output.
And here's the modified example: