langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.5k stars 2.12k forks source link

agent.streamEvents not streaming on the output with Ollama : llama3.1 #6524

Open MP242 opened 2 months ago

MP242 commented 2 months ago

Checked other resources

Example Code

The following code ```typescript

export async function POST(req: Request) { try { const myReq = await req.json(); const { messages } = myReq;

const formattedPreviousMessages = await formatPreviousMessages(messages);
const currentMessageContent = messages[messages.length - 1].content;
console.log("j'envoie le message : ", currentMessageContent);

const tools = [dataByAgenciesTool, get_advisor_CAByMonthTool];
const toolNode = new ToolNode<{ messages: BaseMessage[] }>(tools);

const model = new ChatOllama({
  baseUrl: process.env.OLLAMA_URL,
  model: process.env.MODEL_NAME,
  temperature: 0,
  streaming: true,
});

const boundModel = model.bindTools(tools);

const routeMessage = (state: IState) => {
  const { messages } = state;
  const lastMessage = messages[messages.length - 1] as AIMessage;
  // If no tools are called, we can finish (respond to the user)
  if (!lastMessage?.tool_calls?.length) {
    return END;
  }
  // Otherwise if there is, we continue and call the tools
  return "tools";
};

const callModel = async (state: IState) => {
  // For versions of @langchain/core < 0.2.3, you must call `.stream()`
  // and aggregate the message from chunks instead of calling `.invoke()`.
  const { messages } = state;
  const responseMessage = await boundModel.invoke(messages);
  return { messages: [responseMessage] };
};

const workflow = new StateGraph<IState>({
  channels: graphState,
})
  .addNode("agent", callModel)
  .addNode("tools", toolNode)
  .addEdge("__start__", "agent")
  .addConditionalEdges("agent", routeMessage)
  .addEdge("tools", "agent");

const agent = workflow.compile();

const eventStream = await agent.streamEvents(
  { messages: [["user", currentMessageContent]] },
  {
    version: "v2",
  }
);

for await (const { event, data } of eventStream) {
  if (event === "on_chat_model_stream") {
    const msg = data.chunk as AIMessageChunk;
    console.log(data.chunk);
    if (!msg.tool_call_chunks?.length) {
      console.log(msg.content, "|");
    }
  }
}
return new NextResponse("ok", { status: 200 });
// return LangChainAdapter.toDataStreamResponse(stringStream);

} catch (error) { console.error("Error:", error); return new NextResponse( "Une erreur s'est produite lors du traitement de votre requête.", { status: 500, } ); } }



### Error Message and Stack Trace (if applicable)

Langsmith result : 

{
  "agent": {
    "messages": [
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain_core",
          "messages",
          "AIMessageChunk"
        ],
        "kwargs": {
          "content": "Le meilleur vendeur sur l'île de la Réunion est XXXXXXXXX avec un chiffre d'affaires de XXXX.",
          "tool_call_chunks": [],
          "response_metadata": {
            "model": "llama3.1:8b-instruct-q8_0",
            "created_at": "2024-08-14T04:37:31.391674Z",
            "done_reason": "stop",
            "done": true,
            "total_duration": 2870693667,
            "load_duration": 34859625,
            "prompt_eval_count": 861,
            "prompt_eval_duration": 1701554000,
            "eval_count": 33,
            "eval_duration": 1121423000
          },
          "usage_metadata": {
            "input_tokens": 861,
            "output_tokens": 33,
            "total_tokens": 894
          },
          "tool_calls": [],
          "invalid_tool_calls": [],
          "additional_kwargs": {}
        }
      }
    ]
  },
  "tools": {
    "messages": [
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain_core",
          "messages",
          "ToolMessage"
        ],
        "kwargs": {
          "content": "[{\"name\":\"XXXXX\",\"chiffre d'affaires\":100},{\"name\":\"XXXXX\",\"chiffre d'affaires\":7500},...]",
          "tool_call_id": "15dce1f7-ec39-4898-a0d8-8b5dee89655a",
          "name": "get_advisor_CAByMonth",
          "additional_kwargs": {},
          "response_metadata": {}
        }
      }
    ]
  }
}

### Description

I'm trying to use langgraph agent in my nextjs app with Ollama + llama3.1 model.
agent.streamEvents cannot return streaming response on the final output of the LLM.

I already update the latest version of : 
 - Ollama
 - pull the latest llama3.1 (with tools tag)
 - update npm package for langchain

### System Info

Nextjs APP
  "dependencies": {
    "@langchain/community": "^0.2.27",
    "@langchain/langgraph": "^0.0.34",
    "@langchain/ollama": "^0.0.4",
    "ai": "^3.3.4",
    "axios": "^1.7.3",
    "framer-motion": "^11.3.24",
    "langchain": "^0.2.13",
    "millify": "^6.1.0",
    "moment": "^2.30.1",
    "next": "14.2.5",
    "react": "^18",
    "react-dom": "^18",
    "react-icons": "^5.2.1",
    "tailwind-merge": "^2.5.1"
  },
  "devDependencies": {
    "@tauri-apps/cli": "^1.6.0",
    "@types/node": "^20",
    "@types/react": "^18",
    "@types/react-dom": "^18",
    "eslint": "^8",
    "eslint-config-next": "14.2.5",
    "postcss": "^8",
    "tailwindcss": "^3.4.1",
    "typescript": "^5"
  }
dosubot[bot] commented 2 months ago

Hey there, @MP242! I'm here to assist you with this bug 🐞.

To get agent.streamEvents to return a streaming response on the final output of the LLM in your Next.js app using the langgraph agent with Ollama and the llama3.1 model, you can use the stream method provided by the ChatOllama class. Here is an example of how you can implement this:

import { ChatOllama } from '@langchain/ollama';

const llm = new ChatOllama({
  model: "llama3.1",
  streaming: true,
  // other params...
});

const input = `Your input prompt here`;

async function streamLLMResponse() {
  for await (const chunk of await llm.stream(input)) {
    console.log(chunk);
  }
}

streamLLMResponse();

This code sets up the ChatOllama instance with the llama3.1 model and enables streaming. The streamLLMResponse function then streams the response chunks from the LLM and logs each chunk to the console. The ChatOllama class handles the streaming of responses by utilizing the _streamResponseChunks method, which processes the input messages and streams the response chunks accordingly. The Ollama class in the langchain library provides the necessary infrastructure to handle streaming responses, ensuring that each chunk of the response is processed and yielded as it becomes available [1][2][3].

Make sure that your ChatOllama instance is correctly configured and that the streaming parameter is set to true. This should enable the streaming functionality for the final output of the LLM.

To continue talking to Dosu, mention @dosu.