langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.81k stars 15.12k forks source link

Ollama: parameters and instruction templates #14279

Closed lfoppiano closed 10 months ago

lfoppiano commented 10 months ago

Issue with current documentation:

It's not clear from the documentation whether when calling Ollama, langchain will take care of formatting the template correctly or if I have to supply the template by myself.

For example, in https://ollama.ai/library/mistral:instruct

we have:

parameter
stop "[INST]"
stop "[/INST]"
stop "<<SYS>>"
stop "<</SYS>>"

template
[INST] {{ .System }} {{ .Prompt }} [/INST]

Do I have to take care of formatting my instructions using these parameters and template, or langchain will take care of it?

Idea or request for content:

If this is not implemented, would be very useful to have definitely

dosubot[bot] commented 10 months ago

🤖

Hello @lfoppiano,

Great to see you again! I hope you've been doing well.

Based on the current implementation of the LangChain framework, it does indeed handle the formatting of instructions using the parameters and template provided by Ollama. This is done through the PromptTemplate.from_template and ChatPromptTemplate.from_messages methods, which are used to create prompt templates from string templates. These templates are then used to format the instructions.

For instance, the DEFAULT_REFINE_PROMPT is created using the DEFAULT_REFINE_PROMPT_TMPL string template and the PromptTemplate.from_template method. The placeholders in the string template (e.g., {question}, {existing_answer}, {context_str}) are replaced with actual values when the prompt is used. You can find this in the refine_prompts.py file.

As for the interaction with the Ollama API, it is handled by the ChatOllama class, which formats the list of base messages as text and calls the appropriate methods from the _OllamaCommon superclass with the formatted messages and other parameters. The responses from these methods are then converted into appropriate objects and returned or yielded. This is implemented in the ollama.py file.

Therefore, you do not need to manually format your instructions using the parameters and template provided by Ollama. The LangChain framework takes care of this for you.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

lfoppiano commented 10 months ago

@dosubot, what is the difference between ChatOllama and Ollama? In the documentation I've seen mentioned only the Ollama class..

dosubot[bot] commented 10 months ago

🤖

The Ollama and ChatOllama classes in the LangChain framework are both designed to run large language models locally, but they are used in slightly different contexts.

The Ollama class is used for generating text based on provided prompts. It inherits from the BaseLLM and _OllamaCommon classes and provides a number of configurable parameters to control the behavior of the language model. The main functionality is provided by the _generate method, which calls out to Ollama's generate endpoint to generate text based on the provided prompts. The generated text is returned as a LLMResult object. Here is an example of how to use the Ollama class:

from langchain.llms import Ollama
ollama = Ollama(model="llama2")
response = ollama._generate(["Tell me a joke."])

On the other hand, the ChatOllama class is designed to generate responses from a given list of messages. It is a subclass of BaseChatModel and _OllamaCommon. The main purpose of this class is provided by the _generate method, which takes a list of BaseMessage objects, formats them into a prompt, and then sends this prompt to the Ollama's generate endpoint. The endpoint returns a response, which is then converted into a ChatResult object containing the generated message and additional information about the generation. Here is an example of how to use the ChatOllama class:

from langchain.chat_models import ChatOllama
ollama = ChatOllama(model="llama2")
response = ollama([
    HumanMessage(content="Tell me about the history of AI")
])

The BaseLLM class that Ollama inherits from is an abstract base class that provides a common interface for all language models. It has several attributes and methods, such as cache, verbose, callbacks, callback_manager, tags, metadata, and others. It also provides methods like raise_deprecation, set_verbose, OutputType, __call__, predict, predict_messages, _identifying_params, __str__, _llm_type, dict, and save.

Unfortunately, the _OllamaCommon class is not mentioned in the provided context, so I cannot provide information about its functionalities. You can find more details about these classes in the LangChain repository.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

bibhas2 commented 6 months ago

To add to the discussion, if you are using ChatPromptTemplate to supply a history of messages, you should use ChatOllama and not Ollama.

from langchain.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOllama(model="mistral:instruct")

prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "When was Abraham Lincoln born?"),
        ("ai", "Abraham Lincoln was born on February 12, 1809."),
        ("human", "How old was he when he died?"),
        ("ai", "Abraham Lincoln died on April 15, 1865, at the age of 56."),
        ("human", "{question}"),
    ]
)

chain = prompt | llm

ans = chain.invoke({"question":"Where did he die?"})

print(ans)

This will send the correct API request. I verified this with tcpdump.

{
    "messages": [
        {
            "role": "user",
            "content": "When was Abraham Lincoln born?",
            "images": []
        },
        {
            "role": "assistant",
            "content": "Abraham Lincoln was born on February 12, 1809.",
            "images": []
        },
        {
            "role": "user",
            "content": "How old was he when he died?",
            "images": []
        },
        {
            "role": "assistant",
            "content": "Abraham Lincoln died on April 15, 1865, at the age of 56.",
            "images": []
        },
        {
            "role": "user",
            "content": "Where did he die?",
            "images": []
        }
    ],
    "model": "mistral:instruct",
    "format": null,
    "options": {
        "mirostat": null,
        "mirostat_eta": null,
        "mirostat_tau": null,
        "num_ctx": null,
        "num_gpu": null,
        "num_thread": null,
        "num_predict": null,
        "repeat_last_n": null,
        "repeat_penalty": null,
        "temperature": null,
        "stop": [],
        "tfs_z": null,
        "top_k": null,
        "top_p": null
    },
    "system": null,
    "template": null
}

Ollama server will now apply the model's template before sending the input to the model.

In the same situation if you use Ollama:

llm = Ollama(model="mistral:instruct")

The request will look like this:

{
    "prompt": "Human: When was Abraham Lincoln born?\nAI: Abraham Lincoln was born on February 12, 1809.\nHuman: How old was he when he died?\nAI: Abraham Lincoln died on April 15, 1865, at the age of 56.\nHuman: Where did he die?",
    "images": null,
    "model": "mistral:instruct",
    "format": null,
    "options": {
        "mirostat": null,
        "mirostat_eta": null,
        "mirostat_tau": null,
        "num_ctx": null,
        "num_gpu": null,
        "num_thread": null,
        "num_predict": null,
        "repeat_last_n": null,
        "repeat_penalty": null,
        "temperature": null,
        "stop": [],
        "tfs_z": null,
        "top_k": null,
        "top_p": null
    },
    "system": null,
    "template": null
}

This may or may not work well depending on the target model. But this is not an ideal situation. ChatOllama will work better.