say one thing with ElevenLabsTTSService doesnt work anymore

durandom commented 2 weeks ago

I replaced the CartesiaTTSService with ElevenLabsTTSService in the https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/01-say-one-thing.py example, but that doesnt work anymore with 0.0.43

Here's some logging output.

❯ pipenv run python ./pipecat/01-say-one-thing.py
Loading .env environment variables...
2024-10-11 16:15:56.950 | DEBUG    | pipecat.processors.frame_processor:link:134 - Linking PipelineSource#0 -> ElevenLabsTTSService#0
2024-10-11 16:15:56.950 | DEBUG    | pipecat.processors.frame_processor:link:134 - Linking ElevenLabsTTSService#0 -> DailyOutputTransport#0
2024-10-11 16:15:56.950 | DEBUG    | pipecat.processors.frame_processor:link:134 - Linking DailyOutputTransport#0 -> PipelineSink#0
2024-10-11 16:15:56.950 | DEBUG    | pipecat.processors.frame_processor:link:134 - Linking Source#0 -> Pipeline#0
2024-10-11 16:15:56.950 | DEBUG    | pipecat.processors.frame_processor:link:134 - Linking Pipeline#0 -> Sink#0
2024-10-11 16:15:56.950 | DEBUG    | pipecat.pipeline.runner:run:27 - Runner PipelineRunner#0 started running PipelineTask#0
2024-10-11 16:15:56.950 | DEBUG    | pipecat.services.elevenlabs:_connect:301 - Language code [en] not applied. Language codes can only be used with the 'eleven_turbo_v2_5' model.
2024-10-11 16:15:57.143 | INFO     | pipecat.transports.services.daily:join:299 - Joining https://b4mad.daily.co/.....
2024-10-11 16:15:58.041 | INFO     | pipecat.transports.services.daily:on_participant_joined:523 - Participant joined f86d7746-c249-447e-9bb3-45192846b589
2024-10-11 16:15:58.786 | INFO     | pipecat.transports.services.daily:join:318 - Joined https://b4mad.daily.co/...
2024-10-11 16:15:58.786 | DEBUG    | pipecat.services.elevenlabs:run_tts:381 - Generating TTS: [Hello there, Marcel!]
2024-10-11 16:15:58.786 | DEBUG    | pipecat.transports.base_output:_bot_started_speaking:333 - Bot started speaking
2024-10-11 16:16:00.790 | DEBUG    | pipecat.transports.base_output:_bot_stopped_speaking:338 - Bot stopped speaking
^C2024-10-11 16:16:07.783 | WARNING  | pipecat.pipeline.runner:_sig_handler:51 - Interruption detected. Canceling runner PipelineRunner#0
2024-10-11 16:16:07.783 | DEBUG    | pipecat.pipeline.runner:cancel:38 - Canceling runner PipelineRunner#0
2024-10-11 16:16:07.783 | DEBUG    | pipecat.pipeline.task:cancel:118 - Canceling pipeline task PipelineTask#0
2024-10-11 16:16:07.905 | INFO     | pipecat.transports.services.daily:leave:406 - Leaving https://b4mad.daily.co/...
2024-10-11 16:16:07.933 | INFO     | pipecat.transports.services.daily:leave:415 - Left https://b4mad.daily.co/...
2024-10-11 16:16:07.934 | DEBUG    | pipecat.pipeline.runner:run:31 - Runner PipelineRunner#0 finished running PipelineTask#0

And here's the modified example:

#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

import asyncio
import aiohttp
import os
import sys

from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport

from runner import configure

from loguru import logger

from dotenv import load_dotenv
load_dotenv(override=True)

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")

async def main():
    async with aiohttp.ClientSession() as session:
        (room_url, _) = await configure(session)

        transport = DailyTransport(
            room_url, None, "Say One Thing", DailyParams(audio_out_enabled=True))

        # tts = CartesiaTTSService(
        #     api_key=os.getenv("CARTESIA_API_KEY"),
        #     voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22",  # British Lady
        # )
        tts = ElevenLabsTTSService(
            aiohttp_session=session,
            api_key=os.getenv("ELEVENLABS_API_KEY"),
            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
            model="eleven_multilingual_v2",
        )

        runner = PipelineRunner()

        task = PipelineTask(Pipeline([tts, transport.output()]))

        # Register an event handler so we can play the audio when the
        # participant joins.
        @transport.event_handler("on_participant_joined")
        async def on_new_participant_joined(transport, participant):
            participant_name = participant["info"]["userName"] or ''
            await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))

        await runner.run(task)

if __name__ == "__main__":
    asyncio.run(main())

durandom commented 2 weeks ago

If I add a

await task.queue_frame(LLMFullResponseEndFrame())

after the TextFrame, then it works 🤷

BrianMwas commented 1 week ago

I am also getting the issue this is my sample code

    try:
        async with aiohttp.ClientSession() as session:
            (room_url, token) = await configure(session)
        logger.debug(f"Getting the room and token from the session {room_url} and {token}")
        transport = DailyTransport(
            room_url, token, "ChatBot", DailyParams(
                audio_in_enabled=True,
                audio_out_enabled=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
                transcription_enabled=True,
                vad_audio_passthrough=True
            )
        )

        tts = DeepgramSTTService(
            api_key=config("DEEPGRAM_API_KEY"),
            live_options=LiveOptions(
                encoding="linear16",
                model="nova-2-conversationalai",
                sample_rate=16000,
                channels=1,
                interim_results=True,
                smart_format=True,
                punctuate=True,
                profanity_filter=True,
                vad_events=True,
            )
        )

        llm = OpenAILLMService(api_key=config("OPENAI_API_KEY"), model="gpt-4o")
        messages = [
            {
                "role": "system",
                "content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself."
            }
        ]

        user_response = LLMUserResponseAggregator()
        assistant_response = LLMAssistantResponseAggregator()

        pipeline = Pipeline(
            [
                transport.input(),
                user_response,
                llm,
                tts,
                transport.output(),
                assistant_response
            ]
        )

        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True, enable_metrics=True))

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(trans, participant):
            logging.info(f"a participant joined {participant}")
            logging.info(f"we are getting the response {trans}")
            transport.capture_participant_transcription(participant["id"])
            await task.queue_frames([LLMMessagesFrame(messages),])

        @transport.event_handler("on_participant_left")
        async def on_participant_left(trans, participant, reason):
            print(f"Participant left: {participant}")
            logging.info(f"results on leaving the info {trans}")
            await task.queue_frame(EndFrame())

        runner = PipelineRunner()
        await runner.run(task)

    except Exception as e:
        import traceback
        logger.error(f"An error occurred: {str(e)}")
        logger.error(f"we found an issue {traceback.format_exc()}")

if __name__ == '__main__':
    logging.info("we are running")
    # execute only if run as the entry point into the program
    asyncio.run(main())

The idea was to just get it to respond back, but it never gives a response, it only connects via daily I can see the chatbot and the details of the chatbot but never a response. Deepgram however shows that my credits have been spent, yesterday it went from 0 - 10$ in one conversation. So not sure what is taking so much

BrianMwas commented 1 week ago

If I add a
await task.queue_frame(LLMFullResponseEndFrame())
after the TextFrame, then it works 🤷

@durandom did you get it to work, really need to complete this part of the app

durandom commented 1 week ago

Yes, queueing the LLMFullResponseEndFrame worked for me. See https://github.com/b4mad/mds-moderator/blob/6f4b37453a6e978f4578feec2c28c714430937a1/participant.py#L85

danthegoodman1 commented 1 week ago

Same issue, had to replace EndFrame with LLMFullResponseEndFrame

danthegoodman1 commented 1 week ago

Seems you can also just remove the end frame