langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.76k stars 15.34k forks source link

[BUG]: HuggingFacePipeline not initializes correctly when passing transformers pipeline #25915

Open Kirushikesh opened 2 months ago

Kirushikesh commented 2 months ago

Checked other resources

Example Code

from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline

model_id = "microsoft/Phi-3.5-mini-instruct"
pipe = pipeline("text-generation", model=model_id)

llm = HuggingFacePipeline(pipeline=pipe)
llm.model_id # Output: 'gpt2'

While initializing the HuggingFacePipeline through model_id gives,

from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="microsoft/Phi-3.5-mini-instruct",
    task="text-generation",
)
llm.model_id # Output is : 'microsoft/Phi-3.5-mini-instruct'

Error Message and Stack Trace (if applicable)

No response

Description

I am using langchain, when using HuggingFacePipeline first I initialized it with custom transformers pipeline doing that results in incorrect model name as 'gpt-2'. While using the same module and initialization through .model_id() recognizes model_id correctly as phi3. Due to the model_id mismatch the other modules are getting impacted like using ChatHuggingFace because ChatHuggingFace uses model_id to initialize the tokenizer.

System Info

"pip freeze | grep langchain"

langchain==0.2.15
langchain-community==0.0.27
langchain-core==0.2.37
langchain-huggingface==0.0.3
langchain-text-splitters==0.2.2

platform: Linux Python: 3.10.6

AndrewEffendi commented 3 weeks ago

hi @eyurtsev I can confirm that this issue is still persist, Can we take on this issue? We are group of students at UofT developers interested in this issue. I have seen that you have wrote some code in the PR that is now closed. Do you want us to confirm that the change work for pydantic 2 and add some unit test to confirm its working?