langchain-ai / langchain-google

MIT License
74 stars 78 forks source link

Call a model with VertexAIModelGarden generate echo and 16 tokens only #334

Open pprados opened 3 days ago

pprados commented 3 days ago

I deploy a model mistralai_mistral-7b-instruct-v0_2 via Vertex AI. And, I need to invoke the model with this very simple code.

import asyncio

import sys

from pprint import pprint

from langchain.chains.llm import LLMChain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompt_values import StringPromptValue
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_google_vertexai import VertexAIModelGarden, VertexAI

def main():
    llm = VertexAIModelGarden(
        project="XXXX",
        location="europe-west2",
        endpoint_id="0000000000",
    )
    print("-------------")
    prompt = ChatPromptTemplate.from_template("tell me a short joke about {topic}")
    chain = prompt | llm | StrOutputParser()

    final_result=chain.invoke({"topic": "ice cream"})
    print(final_result)

main()

The result is

-------------
Prompt:
Human: tell me a short joke about ice cream
Output:
Anytime! Here's a classic, light-hearted ice

Process finished with exit code 0

An echo of the prompt, and only 16 tokens.

Where is the error?