run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.57k stars 5.23k forks source link

[Bug]: Unnecessary warning in Huggingface LLM when tokenizer is passed as a function argument #16035

Closed CuriousMonkey7 closed 1 month ago

CuriousMonkey7 commented 1 month ago

Bug Description

This warning gets logged even if you pass the correct tokenizer for the model.

Version

llama-index version: 0.11.9 lama-index-llms-huggingface version: 0.3.3

Steps to Reproduce

import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

from llama_index.llms.huggingface import HuggingFaceLLM

model_name = "distilbert/distilgpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

hf_llm = HuggingFaceLLM(model=model, tokenizer=tokenizer)

Above code issues below warning:

 WARNING:llama_index.llms.huggingface.base:The model `StabilityAI/stablelm-tuned-alpha-3b` and tokenizer `distilbert/distilgpt2` are different, please ensure that they are compatible.

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

Hey @CuriousMonkey7! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together!

I found a similar unsolved discussion: Unnecessary warning in Huggingface LLM when tokenizer is passed as a function argument [1].

The warning about incompatible model and tokenizer in the HuggingFaceLLM class in llama-index version 0.11.9 is issued because the code explicitly checks if the tokenizer.name_or_path is different from the model_name and logs a warning if they are not the same. This happens regardless of whether the correct tokenizer is passed as a function argument.

Here is the relevant code snippet:

tokenizer = tokenizer or AutoTokenizer.from_pretrained(
    tokenizer_name, **tokenizer_kwargs
)

if tokenizer.name_or_path != model_name:
    logger.warning(
        f"The model `{model_name}` and tokenizer `{tokenizer.name_or_path}` "
        f"are different, please ensure that they are compatible."
    )

This warning is issued to ensure that the user is aware of the potential incompatibility between the model and the tokenizer, even if the correct tokenizer is passed as an argument [2][3].

To continue talking to Dosu, mention @dosu.