langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

https://python.langchain.com

MIT License

92.04k stars 14.65k forks source link

connect timeout when using mapreduce to summary ,how to avoid to access huggingface ? #11011

Closed evyy1234 closed 7 months ago

evyy1234 commented 11 months ago

System Info

i am using langchain mapreduce to summarize long text, by loading local llm(chatglm2-6B), but when i run code, get the below error，I do not have access to https://huggingface.co/ from my environment, how could i solve the question? error: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /gpt2/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f58b69e8090>, 'Connection to huggingface.co timed out. (connect timeout=10)'))' thrown while requesting HEAD https://huggingface.co/gpt2/resolve/main/tokenizer_config.json 'HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /gpt2/resolve/main/vocab.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f58b6e1a010>, 'Connection to huggingface.co timed out. (connect timeout=10)'))' thrown while requesting HEAD https://huggingface.co/gpt2/resolve/main/vocab.json Partial code:

if name == 'main': model_path="/home/model/chatglm2-6b-32k" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda() model=model.eval() uvicorn.run(app, host='0.0.0.0', port=8000, workers=1)

def mapreduce(llm,split_docs,map_template,reduce_template): map_prompt = PromptTemplate.from_template(map_template) map_chain = LLMChain(llm=llm, prompt=map_prompt)

Reduce

reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)
# Takes a list of documents, combines them into a single string, and passes this to an LLMChain
combine_documents_chain = StuffDocumentsChain(llm_chain=reduce_chain, document_variable_name="doc_summaries")
# Combines and iteravely reduces the mapped documents
reduce_documents_chain = ReduceDocumentsChain(
    # This is final chain that is called.
    combine_documents_chain=combine_documents_chain,
    # If documents exceed context for `StuffDocumentsChain`
    collapse_documents_chain=combine_documents_chain,)
    # The maximum number of tokens to group documents into.
    #token_max=4500,)
# Combining documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    # Map chain
    llm_chain=map_chain,
    # Reduce chain
    reduce_documents_chain=reduce_documents_chain,
    # The variable name in the llm_chain to put the documents in
    document_variable_name="docs",
    # Return the results of the map steps in the output
    return_intermediate_steps=False,)
return map_reduce_chain.run(split_docs)

Who can help?

@dosu-beta @sbusso @jarib @zeke @deepblue @

Information

[ ] The official example notebooks/scripts
[ ] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

above

Expected behavior

Do not visit the huggingface , via local llm to tokenizer

dosubot[bot] commented 11 months ago

🤖

Hello,

It seems like you're trying to use a local model and tokenizer with LangChain, but the code is still trying to access HuggingFace's servers. This is likely because the AutoTokenizer.from_pretrained and AutoModel.from_pretrained methods are designed to download the model and tokenizer from HuggingFace by default.

To use a local model and tokenizer, you need to modify the _load_transformer function in the SelfHostedHuggingFaceLLM class. Replace the model_id in AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) and AutoModel.from_pretrained(model_id, trust_remote_code=True) with the path to your local tokenizer and model.

Here's how you can modify the _load_transformer function:

def _load_transformer(
    model_id: str = DEFAULT_MODEL_ID,
    task: str = DEFAULT_TASK,
    device: int = 0,
    model_kwargs: Optional[dict] = None,
) -> Any:
    from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM, AutoTokenizer
    from transformers import pipeline as hf_pipeline

    _model_kwargs = model_kwargs or {}
    tokenizer = AutoTokenizer.from_pretrained("/path/to/your/local/tokenizer")

    try:
        if task == "text-generation":
            model = AutoModelForCausalLM.from_pretrained("/path/to/your/local/model")
        elif task in ("text2text-generation", "summarization"):
            model = AutoModelForSeq2SeqLM.from_pretrained("/path/to/your/local/model")
        else:
            raise ValueError(
                f"Got invalid task {task}, "
                f"currently only {VALID_TASKS} are supported"
            )
    except ImportError as e:
        raise ValueError(
            f"Could not load the {task} model due to missing dependencies."
        ) from e

    # rest of the function remains the same

Remember to replace "/path/to/your/local/tokenizer" and "/path/to/your/local/model" with the actual paths to your local tokenizer and model.

You can find more details about the SelfHostedHuggingFaceLLM class in the LangChain codebase.

I hope this helps! Let me know if you have any other questions.

Sources

libs/langchain/langchain/llms/self_hosted_hugging_face.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

evyy1234 commented 11 months ago

🤖

Hello,

It seems like you're trying to use a local model and tokenizer with LangChain, but the code is still trying to access HuggingFace's servers. This is likely because the AutoTokenizer.from_pretrained and AutoModel.from_pretrained methods are designed to download the model and tokenizer from HuggingFace by default.

To use a local model and tokenizer, you need to modify the _load_transformer function in the SelfHostedHuggingFaceLLM class. Replace the model_id in AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) and AutoModel.from_pretrained(model_id, trust_remote_code=True) with the path to your local tokenizer and model.

Here's how you can modify the _load_transformer function:
def _load_transformer(
    model_id: str = DEFAULT_MODEL_ID,
    task: str = DEFAULT_TASK,
    device: int = 0,
    model_kwargs: Optional[dict] = None,
) -> Any:
    from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM, AutoTokenizer
    from transformers import pipeline as hf_pipeline

    _model_kwargs = model_kwargs or {}
    tokenizer = AutoTokenizer.from_pretrained("/path/to/your/local/tokenizer")

    try:
        if task == "text-generation":
            model = AutoModelForCausalLM.from_pretrained("/path/to/your/local/model")
        elif task in ("text2text-generation", "summarization"):
            model = AutoModelForSeq2SeqLM.from_pretrained("/path/to/your/local/model")
        else:
            raise ValueError(
                f"Got invalid task {task}, "
                f"currently only {VALID_TASKS} are supported"
            )
    except ImportError as e:
        raise ValueError(
            f"Could not load the {task} model due to missing dependencies."
        ) from e

    # rest of the function remains the same
Remember to replace "/path/to/your/local/tokenizer" and "/path/to/your/local/model" with the actual paths to your local tokenizer and model.

You can find more details about the SelfHostedHuggingFaceLLM class in the LangChain codebase.

I hope this helps! Let me know if you have any other questions.

Sources

libs/langchain/langchain/llms/self_hosted_hugging_face.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

your answer is very helpful, i have solved the problem, thanks a lot !

dosubot[bot] commented 11 months ago

🤖

Hello,

Thank you for reaching out. You're correct in your understanding that you need to modify the _load_transformer function in the SelfHostedHuggingFaceLLM class to use your local model and tokenizer. Here's how you can do it:

def _load_transformer(
    model_path: str = "/home/model/chatglm2-6b-32k",
    task: str = "summarization",
    device: int = 0,
    model_kwargs: Optional[dict] = None,
) -> Any:
    from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM, AutoTokenizer
    from transformers import pipeline as hf_pipeline

    _model_kwargs = model_kwargs or {}
    tokenizer = AutoTokenizer.from_pretrained(model_path, **_model_kwargs)

    try:
        if task == "text-generation":
            model = AutoModelForCausalLM.from_pretrained(model_path, **_model_kwargs)
        elif task in ("text2text-generation", "summarization"):
            model = AutoModelForSeq2SeqLM.from_pretrained(model_path, **_model_kwargs)
        else:
            raise ValueError(
                f"Got invalid task {task}, "
                f"currently only {VALID_TASKS} are supported"
            )
    except ImportError as e:
        raise ValueError(
            f"Could not load the {task} model due to missing dependencies."
        ) from e

    if importlib.util.find_spec("torch") is not None:
        import torch

        cuda_device_count = torch.cuda.device_count()
        if device < -1 or (device >= cuda_device_count):
            raise ValueError(
                f"Got device=={device}, "
                f"device is required to be within [-1, {cuda_device_count})"
            )
        if device < 0 and cuda_device_count > 0:
            logger.warning(
                "Device has %d GPUs available. "
                "Provide device={deviceId} to `from_model_id` to use available"
                "GPUs for execution. deviceId is -1 for CPU and "
                "can be a positive integer associated with CUDA device id.",
                cuda_device_count,
            )
    pipeline = hf_pipeline(
        task=task,
        model=model,
        tokenizer=tokenizer,
        device=device,
        model_kwargs=_model_kwargs,
    )
    if pipeline.task not in VALID_TASKS:
        raise ValueError(
            f"Got invalid task {pipeline.task}, "
            f"currently only {VALID_TASKS} are supported"
        )
    return pipeline

In this code, model_path is the path to your local model and tokenizer. You need to replace "/home/model/chatglm2-6b-32k" with the path to your local model and tokenizer. The from_pretrained method of AutoTokenizer, AutoModelForCausalLM, and AutoModelForSeq2SeqLM is replaced with the path to your local model and tokenizer.

Please note that you need to ensure that the local model and tokenizer are compatible with the HuggingFace transformers library. If you encounter any issues, please let me know.

Best, Dosu

Sources

libs/langchain/langchain/llms/self_hosted_hugging_face.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

evyy1234 commented 11 months ago

🤖

Hello,

Thank you for reaching out. You're correct in your understanding that you need to modify the _load_transformer function in the SelfHostedHuggingFaceLLM class to use your local model and tokenizer. Here's how you can do it:
def _load_transformer(
    model_path: str = "/home/model/chatglm2-6b-32k",
    task: str = "summarization",
    device: int = 0,
    model_kwargs: Optional[dict] = None,
) -> Any:
    from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM, AutoTokenizer
    from transformers import pipeline as hf_pipeline

    _model_kwargs = model_kwargs or {}
    tokenizer = AutoTokenizer.from_pretrained(model_path, **_model_kwargs)

    try:
        if task == "text-generation":
            model = AutoModelForCausalLM.from_pretrained(model_path, **_model_kwargs)
        elif task in ("text2text-generation", "summarization"):
            model = AutoModelForSeq2SeqLM.from_pretrained(model_path, **_model_kwargs)
        else:
            raise ValueError(
                f"Got invalid task {task}, "
                f"currently only {VALID_TASKS} are supported"
            )
    except ImportError as e:
        raise ValueError(
            f"Could not load the {task} model due to missing dependencies."
        ) from e

    if importlib.util.find_spec("torch") is not None:
        import torch

        cuda_device_count = torch.cuda.device_count()
        if device < -1 or (device >= cuda_device_count):
            raise ValueError(
                f"Got device=={device}, "
                f"device is required to be within [-1, {cuda_device_count})"
            )
        if device < 0 and cuda_device_count > 0:
            logger.warning(
                "Device has %d GPUs available. "
                "Provide device={deviceId} to `from_model_id` to use available"
                "GPUs for execution. deviceId is -1 for CPU and "
                "can be a positive integer associated with CUDA device id.",
                cuda_device_count,
            )
    pipeline = hf_pipeline(
        task=task,
        model=model,
        tokenizer=tokenizer,
        device=device,
        model_kwargs=_model_kwargs,
    )
    if pipeline.task not in VALID_TASKS:
        raise ValueError(
            f"Got invalid task {pipeline.task}, "
            f"currently only {VALID_TASKS} are supported"
        )
    return pipeline
In this code, model_path is the path to your local model and tokenizer. You need to replace "/home/model/chatglm2-6b-32k" with the path to your local model and tokenizer. The from_pretrained method of AutoTokenizer, AutoModelForCausalLM, and AutoModelForSeq2SeqLM is replaced with the path to your local model and tokenizer.

Please note that you need to ensure that the local model and tokenizer are compatible with the HuggingFace transformers library. If you encounter any issues, please let me know.

Best, Dosu

Sources

libs/langchain/langchain/llms/self_hosted_hugging_face.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

hello, I forget to close vpn that can access huggingface just now, in fact when i close vpn, the same error comes up, even though i have modified the _load_transformer function in the SelfHostedHuggingFaceLLM class to use local model and tokenize. It seems like mapreduce function will visit huggingface and use gpt2 method, how to avoid visiting gpt2 online? thanks~

dosubot[bot] commented 11 months ago

🤖

Hello,

From the code you've shared, it seems like you're on the right track. The from_pretrained method of AutoTokenizer, AutoModelForCausalLM, and AutoModelForSeq2SeqLM can indeed accept a local path to a model and tokenizer.

To verify if your local model and tokenizer are compatible with the HuggingFace transformers library, you can try to load them outside of the LangChain context. Here's a simple script you can use:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "/path/to/your/model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

print("Model and tokenizer loaded successfully.")

If this script runs without any errors, then your local model and tokenizer are compatible with the HuggingFace transformers library.

In the context of LangChain, you can follow the steps provided in the issue #2667 to use a local model with the HuggingFacePipeline class. Here's the relevant part of the solution:

from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_path = "/path/to/your/model"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

hf = HuggingFacePipeline(pipeline=pipe)
prompt = PromptTemplate(template="Question: {question}\n\nAnswer: Let's think step by step.", input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=hf)

Please replace "/path/to/your/model" with the actual path to your local model and tokenizer. If this code runs without any errors, then your local model and tokenizer are compatible with the LangChain framework.

As for the valid tasks, the _load_transformer function in the SelfHostedHuggingFaceLLM class supports "text2text-generation", "text-generation", and "summarization". This is indicated by the VALID_TASKS tuple in the code:

VALID_TASKS = ("text2text-generation", "text-generation", "summarization")

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

evyy1234 commented 11 months ago

🤖

Hello,

From the code you've shared, it seems like you're on the right track. The from_pretrained method of AutoTokenizer, AutoModelForCausalLM, and AutoModelForSeq2SeqLM can indeed accept a local path to a model and tokenizer.

To verify if your local model and tokenizer are compatible with the HuggingFace transformers library, you can try to load them outside of the LangChain context. Here's a simple script you can use:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "/path/to/your/model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

print("Model and tokenizer loaded successfully.")
If this script runs without any errors, then your local model and tokenizer are compatible with the HuggingFace transformers library.

In the context of LangChain, you can follow the steps provided in the issue #2667 to use a local model with the HuggingFacePipeline class. Here's the relevant part of the solution:
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_path = "/path/to/your/model"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

hf = HuggingFacePipeline(pipeline=pipe)
prompt = PromptTemplate(template="Question: {question}\n\nAnswer: Let's think step by step.", input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=hf)
Please replace "/path/to/your/model" with the actual path to your local model and tokenizer. If this code runs without any errors, then your local model and tokenizer are compatible with the LangChain framework.

As for the valid tasks, the _load_transformer function in the SelfHostedHuggingFaceLLM class supports "text2text-generation", "text-generation", and "summarization". This is indicated by the VALID_TASKS tuple in the code:
VALID_TASKS = ("text2text-generation", "text-generation", "summarization")
I hope this helps! Let me know if you have any other questions.

Sources

huggingfacehub model from local folder?

libs/langchain/langchain/llms/huggingface_endpoint.py

libs/langchain/langchain/embeddings/huggingface_hub.py

libs/langchain/tests/integration_tests/llms/test_huggingface_hub.py

libs/langchain/tests/integration_tests/llms/test_huggingface_pipeline.py

libs/langchain/langchain/llms/self_hosted_hugging_face.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Thanks for your reply, this script run correctly in my environment, `#model_path = "/home/chatglm2-6b-32k"

tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(model_path,trust_remote_code=True)

print("Model and tokenizer loaded successfully.")`

then i use a local model with the HuggingFacePipeline class. model_path = "/home/chatglm2-6b-32k" model = AutoModelForCausalLM.from_pretrained(model_path,trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True) pipe = pipeline('text-generation',model=model, tokenizer=tokenizer) hf = HuggingFacePipeline(pipeline=pipe) prompt = PromptTemplate(template="Question: {question}\n\nAnswer: Let's think step by step.", input_variables=["question"]) llm_chain = LLMChain(prompt=prompt, llm=hf)

i use "text2text-generation", "text-generation", "summarization" tasks, but get the same error: it seems like chatglm do not support pipeline ? What should be done in this situation?

dosubot[bot] commented 11 months ago

🤖

Hello,

Based on the information you've provided, it seems like you're on the right track with loading your local model and tokenizer. The from_pretrained method of AutoTokenizer, AutoModelForCausalLM, and AutoModelForSeq2SeqLM should indeed work with a local path to a model and tokenizer.

However, without knowing the specific error messages or issues you're encountering, it's difficult to provide a precise solution. Could you please provide more details about the errors or issues you're facing? This could include error messages, unexpected behavior, or other symptoms that could help diagnose the problem.

Regarding your question about the _load_transformer function in the SelfHostedHuggingFaceLLM class, it supports the following tasks: "text2text-generation", "text-generation", and "summarization". This is indicated by the VALID_TASKS variable and the conditional statements in the _load_transformer function. Here's the relevant code:

VALID_TASKS = ("text2text-generation", "text-generation", "summarization")

if task == "text-generation":
    model = AutoModelForCausalLM.from_pretrained(model_id, **_model_kwargs)
elif task in ("text2text-generation", "summarization"):
    model = AutoModelForSeq2SeqLM.from_pretrained(model_id, **_model_kwargs)
else:
    raise ValueError(
        f"Got invalid task {task}, "
        f"currently only {VALID_TASKS} are supported"
    )

This means that your model should be compatible with one of these tasks to be used with the SelfHostedHuggingFaceLLM class.

I hope this helps! Please provide more details about the issues you're facing so we can assist you further.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

evyy1234 commented 11 months ago

it seems like my local model and tokenizer are compatible with the langchain , because when i use other langchain summary chain like stuff or refine , it work well. but when use mapreduce to summary , it will visit huggingface ,show the bellow error. error: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /gpt2/resolve/main/tokenizer_config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f58b69e8090>, 'Connection to huggingface.co timed out. (connect timeout=10)'))' thrown while requesting HEAD https://huggingface.co/gpt2/resolve/main/tokenizer_config.json 'HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /gpt2/resolve/main/vocab.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f58b6e1a010>, 'Connection to huggingface.co timed out. (connect timeout=10)'))' thrown while requesting HEAD https://huggingface.co/gpt2/resolve/main/vocab.json

evyy1234 commented 11 months ago

specifically，i create a custom LLM wrapper,

`class ChatGLM(LLM): max_token: int = 100000 temperature: float = 0.2 top_p = 0.6 history = []

def __init__(self):
    super().__init__()

@property
def _llm_type(self) -> str:
    return "ChatGLM"

def _call(self, prompt: str, stop: Optional[List[str]] = None,) -> str:
    headers = {'Content-Type': 'application/json'}
    data=json.dumps({ 'prompt':prompt, 'temperature':self.temperature,'history':self.history,'max_length':self.max_token}) 
    response = requests.post("http://0.0.0.0:8000",headers=headers,data=data)
    if response.status_code!=200:
        return "failed"
    resp = response.json()
    if stop is not None:
        response = enforce_stop_tokens(response, stop)
    self.history = self.history+[[None, resp['response']]]
    return resp['response']

and using api to interactive local model and tokenizer： `@app.post("/") async def create_item(request: Request): global model, tokenizer json_post_raw = await request.json() json_post = json.dumps(json_post_raw) json_post_list = json.loads(json_post) prompt = json_post_list.get('prompt') history = json_post_list.get('history') max_length = json_post_list.get('max_length') top_p = json_post_list.get('top_p') temperature = json_post_list.get('temperature') response, history = model.chat(tokenizer, prompt, history=history, max_length=max_length if max_length else 2048, top_p=top_p if top_p else 0.7, temperature=temperature if temperature else 0.7) now = datetime.datetime.now() time = now.strftime("%Y-%m-%d %H:%M:%S") answer = { "response": response, "history": history, "status": 200, "time": time } log = "[" + time + "] " + '", prompt:"' + prompt + '", response:"' + repr(response) + '"' print(log)

torch_gc()

return answer

if name == 'main': model_path="/home/fgl/wqm/llm_summary/model/chatglm2-6b-32k" tokenizer = AutoTokenizer.from_pretrained("/home/fgl/wqm/llm_summary/model/chatglm2-6b-32k", trust_remote_code=True) model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()

#model = AutoModel.from_pretrained(model_path, trust_remote_code=True,revision="v1.0").float().to('mps')
model=model.eval()
uvicorn.run(app, host='0.0.0.0', port=8000, workers=1)`

dosubot[bot] commented 8 months ago

Hi, @evyy1234

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you encountered a connect timeout error when using mapreduce to summarize long text with a local language model (LLM) and trying to avoid accessing Hugging Face. Initially, I provided guidance on modifying the _load_transformer function in the SelfHostedHuggingFaceLLM class to use a local model and tokenizer, which seemed to resolve the issue. However, you later encountered the same error even after closing the VPN. I continued to provide guidance on verifying the compatibility of the local model and tokenizer with the HuggingFace transformers library. You mentioned that the local model and tokenizer worked with other LangChain components, but encountered the error specifically when using mapreduce to summarize.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

joostjansenn commented 5 months ago

🤖

Hello,

It seems like you're trying to use a local model and tokenizer with LangChain, but the code is still trying to access HuggingFace's servers. This is likely because the AutoTokenizer.from_pretrained and AutoModel.from_pretrained methods are designed to download the model and tokenizer from HuggingFace by default.

To use a local model and tokenizer, you need to modify the _load_transformer function in the SelfHostedHuggingFaceLLM class. Replace the model_id in AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) and AutoModel.from_pretrained(model_id, trust_remote_code=True) with the path to your local tokenizer and model.

Here's how you can modify the _load_transformer function:
def _load_transformer(
    model_id: str = DEFAULT_MODEL_ID,
    task: str = DEFAULT_TASK,
    device: int = 0,
    model_kwargs: Optional[dict] = None,
) -> Any:
    from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM, AutoTokenizer
    from transformers import pipeline as hf_pipeline

    _model_kwargs = model_kwargs or {}
    tokenizer = AutoTokenizer.from_pretrained("/path/to/your/local/tokenizer")

    try:
        if task == "text-generation":
            model = AutoModelForCausalLM.from_pretrained("/path/to/your/local/model")
        elif task in ("text2text-generation", "summarization"):
            model = AutoModelForSeq2SeqLM.from_pretrained("/path/to/your/local/model")
        else:
            raise ValueError(
                f"Got invalid task {task}, "
                f"currently only {VALID_TASKS} are supported"
            )
    except ImportError as e:
        raise ValueError(
            f"Could not load the {task} model due to missing dependencies."
        ) from e

    # rest of the function remains the same
Remember to replace "/path/to/your/local/tokenizer" and "/path/to/your/local/model" with the actual paths to your local tokenizer and model.

You can find more details about the SelfHostedHuggingFaceLLM class in the LangChain codebase.

I hope this helps! Let me know if you have any other questions.

Sources

libs/langchain/langchain/llms/self_hosted_hugging_face.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosu-bot could you elaborate a bit more on how to overwrite this function in practice? We are stuck on this problem for quite some time already now.

joostjansenn commented 5 months ago

Hi, @evyy1234

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you encountered a connect timeout error when using mapreduce to summarize long text with a local language model (LLM) and trying to avoid accessing Hugging Face. Initially, I provided guidance on modifying the _load_transformer function in the SelfHostedHuggingFaceLLM class to use a local model and tokenizer, which seemed to resolve the issue. However, you later encountered the same error even after closing the VPN. I continued to provide guidance on verifying the compatibility of the local model and tokenizer with the HuggingFace transformers library. You mentioned that the local model and tokenizer worked with other LangChain components, but encountered the error specifically when using mapreduce to summarize.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

@dosu-bot I ran into this issue as well without solution. Can you help us?