[Bug]: Unnecessary warning in Huggingface LLM when tokenizer is passed as a function argument

run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

MIT License

36.57k stars 5.23k forks source link

import transformers from transformers import AutoTokenizer, AutoModelForCausalLM from llama_index.llms.huggingface import HuggingFaceLLM model_name = "distilbert/distilgpt2" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) hf_llm = HuggingFaceLLM(model=model, tokenizer=tokenizer)

Hey @CuriousMonkey7! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together!

I found a similar unsolved discussion: Unnecessary warning in Huggingface LLM when tokenizer is passed as a function argument ^[1].

The warning about incompatible model and tokenizer in the HuggingFaceLLM class in llama-index version 0.11.9 is issued because the code explicitly checks if the tokenizer.name_or_path is different from the model_name and logs a warning if they are not the same. This happens regardless of whether the correct tokenizer is passed as a function argument.

Here is the relevant code snippet:

tokenizer = tokenizer or AutoTokenizer.from_pretrained(
    tokenizer_name, **tokenizer_kwargs
)

if tokenizer.name_or_path != model_name:
    logger.warning(
        f"The model `{model_name}` and tokenizer `{tokenizer.name_or_path}` "
        f"are different, please ensure that they are compatible."
    )

This warning is issued to ensure that the user is aware of the potential incompatibility between the model and the tokenizer, even if the correct tokenizer is passed as an argument ^[2]^[3].

To continue talking to Dosu, mention @dosu.

run-llama / llama_index

[Bug]: Unnecessary warning in Huggingface LLM when tokenizer is passed as a function argument #16035

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks