Llama.cpp structured output not properly requested

isaacwasserman commented 2 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

This is code from this page of documentation, specifically the "structured output" section.

import os
import json
import multiprocessing
from langchain_community.chat_models import ChatLlamaCpp
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.utils.function_calling import convert_to_openai_tool
import langchain
langchain.debug = True

model_path = os.path.join(models_dir, "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf")

llm = ChatLlamaCpp(
    temperature=0.5,
    model_path=model_path,
    n_ctx=10000,
    n_gpu_layers=8,
    n_batch=300,  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    max_tokens=512,
    n_threads=multiprocessing.cpu_count() - 1,
    repeat_penalty=1.5,
    top_p=0.5,
    verbose=False
)

class Joke(BaseModel):
    """A setup to a joke and the punchline."""
    setup: str
    punchline: str

dict_schema = convert_to_openai_tool(Joke)
structured_llm = llm.with_structured_output(dict_schema)
result = structured_llm.invoke("Tell me a joke about birds")

print(result)

Error Message and Stack Trace (if applicable)

N/A

Description

Instead of giving structured output, the code prints None. Running with langchain.debug = True reveals that the LLM outputs text, rather than a function call. Therefore, the output parser is unable to give any structured output. It's worth noting that (1) this code does work when the LLM given is GPT 3.5 turbo, rather than llama, and (2) structured output does work with the following llama.cpp code:

from llama_cpp import Llama

llm = Llama(model_path=model_path, chat_format="chatml-function-calling", verbose=False)
response = llm.create_chat_completion(
      messages = [
        {
          "role": "system",
          "content": "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary"

        },
        {
          "role": "user",
          "content": "Tell a joke about birds."
        }
      ],
      tools=[dict_schema],
      tool_choice={
        "type": "function",
        "function": {
          "name": "Joke"
        }
      }
)

One potential problem is that the chat_format="chatml-function-calling" kwarg is not passed to the llama_cpp.Llama() instance, however letting this argument reach llama-cpp results in an error regarding the tool_choice argument which is also not passed.

Debug output:

[chain/start] [chain:RunnableSequence] Entering Chain run with input:
{
  "input": "Tell me a joke about birds"
}
[llm/start] [chain:RunnableSequence > llm:ChatLlamaCpp] Entering LLM run with input:
{
  "prompts": [
    "Human: Tell me a joke about birds"
  ]
}
[llm/end] [chain:RunnableSequence > llm:ChatLlamaCpp] [10.57s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "Here's one:\n\nWhy did the bird go to therapy?\n\nBecause it had an \"egg-xistential\" crisis! (get it?)",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "Here's one:\n\nWhy did the bird go to therapy?\n\nBecause it had an \"egg-xistential\" crisis! (get it?)",
            "response_metadata": {
              "token_usage": {
                "prompt_tokens": 17,
                "completion_tokens": 28,
                "total_tokens": 45
              },
              "finish_reason": "stop",
              "logprobs": null
            },
            "type": "ai",
            "id": "run-83ced40d-8f85-4ce5-9b25-773fbc17998f-0",
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "prompt_tokens": 17,
      "completion_tokens": 28,
      "total_tokens": 45
    }
  },
  "run": null
}
[chain/start] [chain:RunnableSequence > parser:JsonOutputKeyToolsParser] Entering Parser run with input:
[inputs]
[chain/end] [chain:RunnableSequence > parser:JsonOutputKeyToolsParser] [1ms] Exiting Parser run with output:
{
  "output": null
}
[chain/end] [chain:RunnableSequence] [10.57s] Exiting Chain run with output:
{
  "output": null
}

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 23.2.0: Wed Nov 15 21:53:18 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6000 Python Version: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:26:08) [Clang 14.0.6 ]

Package Information

langchain_core: 0.2.29 langchain: 0.2.12 langchain_community: 0.2.11 langsmith: 0.1.98 langchain_experimental: 0.0.57 langchain_openai: 0.1.20 langchain_text_splitters: 0.2.2 langchainhub: 0.1.15

Optional packages not installed

langgraph langserve

Other Dependencies

aiohttp: 3.9.1 async-timeout: 4.0.3 dataclasses-json: 0.6.4 faker: Installed. No version info available. jinja2: 3.0.3 jsonpatch: 1.33 numpy: 1.23.5 openai: 1.40.0 orjson: 3.9.15 packaging: 23.2 pandas: 1.5.3 presidio-analyzer: Installed. No version info available. presidio-anonymizer: Installed. No version info available. pydantic: 2.5.3 PyYAML: 6.0.1 requests: 2.31.0 sentence-transformers: Installed. No version info available. SQLAlchemy: 2.0.19 tabulate: 0.9.0 tenacity: 8.2.2 tiktoken: 0.7.0 types-requests: 2.31.0.20240406 typing-extensions: 4.11.0 vowpal-wabbit-next: Installed. No version info available.

LockedThread commented 1 month ago

@ccurme Any updates?

ccurme commented 3 weeks ago

Thanks @isaacwasserman for the detailed writeup.

From what I can tell this is just due to how tool_choice is specified. I've merged a fix in https://github.com/langchain-ai/langchain/pull/27202. Let me know if you continue to see issues.

langchain-ai / langchain