langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.73k stars 15.54k forks source link

Together LLM (Completions) generate() function's output is missing generation_info and llm_output #25441

Open kerkkoh opened 3 months ago

kerkkoh commented 3 months ago

Checked other resources

Example Code


from langchain_together import Together
from llama_recipes.inference.prompt_format_utils import (
    build_default_prompt,
    create_conversation,
    LlamaGuardVersion,
) # LlamaGuard 3 Prompt from https://github.com/meta-llama/llama-recipes/blob/main/src/llama_recipes/inference/prompt_format_utils.py
from pydantic.v1.types import SecretStr

t = Together(
    model="meta-llama/Meta-Llama-Guard-3-8B",
    together_api_key=SecretStr("<=== API Key goes here ===>"),
    max_tokens=35,
    logprobs=1,
    temperature=0
)

# Expected to return a LLMResult object with Generations that have logprobs, and llm_output with usage
res = t.generate([build_default_prompt("User", create_conversation(["<Sample user prompt>"]), LlamaGuardVersion["LLAMA_GUARD_3"])])

print(res.json()) # {"generations": [[{"text": "safe", "generation_info": null, "type": "Generation"}]], "llm_output": null, "run": [{"run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"}]}

Error Message and Stack Trace (if applicable)

No response

Description

I'm trying to use langchain_together library's Together for calling the Together.ai LLM completions endpoint and expecting to get a LLMResult with logprobs inside of generation_info & usage in llm_output.

Instead the following lacking LLMResult is given as output:

{
    "generations": [
        [
            {
                "text": "safe",
                "generation_info": null,
                "type": "Generation"
            }
        ]
    ],
    "llm_output": null,
    "run": [
        {
            "run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"
        }
    ]
}

where generation_info is None, and llm_output is None.

This should be fixed by updating the langchain_together.Together() class so that it also has the necessary functions to return LLMResults with generation_info & llm_output defined when the response includes fields that are to be put in them. The expected output is:

{
    "generations": [
        [
            {
                "text": "safe",
                "generation_info": {
                    "finish_reason": "eos",
                    "logprobs": {
                        "tokens": [
                            "safe",
                            "<|eot_id|>"
                        ],
                        "token_logprobs": [
                            -4.6014786e-05,
                            -0.008911133
                        ],
                        "token_ids": [
                            19193,
                            128009
                        ]
                    }
                },
                "type": "Generation"
            }
        ]
    ],
    "llm_output": {
        "token_usage": {
            "total_tokens": 219,
            "completion_tokens": 2,
            "prompt_tokens": 217
        },
        "model_name": "meta-llama/Meta-Llama-Guard-3-8B"
    },
    "run": [
        {
            "run_id": "5b93a422-c74a-41e9-af5e-a7958884a9a9"
        }
    ]
}

This could technically be avoided by using the langchain_openai library's langchain_openai.OpenAI, but the generate method of this class is no longer compatible with the old OpenAI Completions -style API that Together.ai uses. Mainly the underlying OpenAIBase._generate method calls the underlying OpenAI completions client with a list[str] of prompts, which Together.ai doesn't support.

Just in case someone finds this issue looking for a fix, I have a workaround for the workaround. The problem with the langchain_openai workaround can be bodged by overriding the openai.client.completions.create method after initializing the LLM class, using the together python library's equivalent method, and removing incompatible arguments to create, which the API doesn't support. The following is a quick example for doing this:

import together
from langchain_openai import OpenAI
from pydantic.v1.types import SecretStr
from llama_recipes.inference.prompt_format_utils import (
    build_default_prompt,
    create_conversation,
    LlamaGuardVersion,
) # LlamaGuard 3 Prompt from https://github.com/meta-llama/llama-recipes/blob/main/src/llama_recipes/inference/prompt_format_utils.py

together_client = together.Together(api_key=SETTINGS.together_api_key)
llm = OpenAI(
    model="meta-llama/Meta-Llama-Guard-3-8B",
    api_key=SecretStr("<=== API Key goes here ===>"),
    base_url="https://api.together.xyz/v1", # This may be redundant as we override the create class method anyways
    max_tokens=200,
    logprobs=1,
    temperature=0
)

def overridden_create(prompt: list[str], **kwargs):
    # Overridden openai.client.completions.create method to use the Together client, as Together doesn't support certain inputs (e.g. seed) and lists of prompts
    together_allowed_keys = ["model", "prompt", "max_tokens", "stream", "stop", "temperature", "top_p", "top_k", "repetition_penalty", "logprobs", "echo", "n", "safety_model"]
    kwargs = {k: v for k, v in kwargs.items() if k in together_allowed_keys}
    return together_client.completions.create(prompt=prompt[0], **kwargs)

llm.client.create = overridden_create
llm_result = llm.generate([build_default_prompt("User", create_conversation(["<Sample user prompt>"]), LlamaGuardVersion["LLAMA_GUARD_3"])])

print(llm_result.json()) # {"generations": [[{"text": "safe", "generation_info": {"finish_reason": "eos", "logprobs": {"tokens": ["safe", "<|eot_id|>"], "token_logprobs": [-4.6014786e-05, -0.008911133], "token_ids": [19193, 128009]}}, "type": "Generation"}]], "llm_output": {"token_usage": {"total_tokens": 219, "completion_tokens": 2, "prompt_tokens": 217}, "model_name": "meta-llama/Meta-Llama-Guard-3-8B"}, "run": [{"run_id": "f015adc7-7558-4251-9fe6-9d11a646c173"}]}

generation = llm_result.generations[0][0]
logprobs = generation.generation_info["logprobs"] # Wow, it works!
token_usage = llm_result.llm_output["token_usage"] # Wow, we also get usage!

System Info

langchain==0.2.14 langchain-core==0.2.32 langchain-openai==0.1.21 langchain-text-splitters==0.2.2 langchain-together==0.1.5

mac (Macbook Pro M1 16GB, 2021), macOS Sonoma 14.5 (23F79)

Python 3.9.19

dosubot[bot] commented 2 weeks ago

Hi, @kerkkoh. I'm Dosu, and I'm helping the LangChain team manage their backlog. I'm marking this issue as stale.

Issue Summary

Next Steps

Thank you for your understanding and contribution!

kerkkoh commented 2 weeks ago

Hi Dosu,

Thanks for reaching out! I’m happy to say that this issue is still a thing with the latest version of LangChain. Let me know if you need anything else or have any questions!

Cheers, @kerkkoh

dosubot[bot] commented 2 weeks ago

@eyurtsev, the user @kerkkoh has confirmed that the issue with the generate() function from langchain_together is still relevant in the latest version of LangChain. Could you please assist them with this?