livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
4.13k stars 444 forks source link

[Question] Streaming chunks of text to TTS #940

Open BrianMwas opened 1 month ago

BrianMwas commented 1 month ago

I would like to know how I can handle chunk of text while still maintaining punctuations. I tried and am getting errors while doing this. This has caused increased latency from ~1s to close to ~4s which is quite long considering this is a conversation. Thanks this is how am working with it when adding context


async def enrich_with_rag(agent: VoicePipelineAgent, chat_ctx: llm.ChatContext):
    global pdf_index

    if pdf_index is None:
        logger.warning("RAG system not initialized. Skipping enrichment.")
        return

    user_msg = chat_ctx.messages[-1]
    try:
        # Use LlamaIndex to query the data
        query_engine = pdf_index.as_query_engine(
            llm=groqLLM, 
             streaming=True, # Adjust model as needed
            similarity_top_k=3
        )
        response = await asyncio.to_thread(query_engine.query, user_msg.content)

        if response:
            # Collect the entire streamed response
            relevant_text = ""
            for text in response:
                relevant_text += text

            logger.info(f"Enriching with RAG: {relevant_text[:100]}...")  # Log first 100 chars

            # Insert RAG context before the user's message
            chat_ctx.messages.insert(-1, llm.ChatMessage.create(
                text=f"Context:\n{relevant_text}",
                role="assistant",
            ))

        # Create a streaming LLM response
        llm_stream = agent.llm.chat(chat_ctx)
        return llm_stream
    except Exception as e:
        logger.error(f"Error during RAG enrichment: {str(e)}")```
BrianMwas commented 1 month ago

I also tried with function calling


class AssistantFnc(llm.FunctionContext):
    @llm.ai_callable()
    async def query_documents(
        self,
        question: Annotated[
            str, llm.TypeInfo(description="The question to ask about the documents")
        ],
    ) -> AsyncGenerator[str, None]:
        """Query the PDF documents for information related to the user's question."""
        logger.info(f"Querying documents for: {question}")
        engine = pdf_index.as_query_engine(
            llm=groqLLM, 
            similarity_top_k=3
        )

        response = engine.query(question)
        return response```