huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.23k stars 27.06k forks source link

[BUG] Offline loading of non-safe tensors fails #30920

Closed pseudotensor closed 3 months ago

pseudotensor commented 6 months ago

System Info

4.41.0 or others python 3.10 Ubuntu 22

Who can help?

@ArthurZucker and @younesbelkada

Information

Tasks

Reproduction

Online do:

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', use_safetensors=False)

Then offline do:

import os
os.environ['HF_HUB_OFFLINE'] = '1'
os.environ['TRANSFORMERS_OFFLINE'] = '1'

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2')

You'll see it fail with connection error despite setting offline envs:

Traceback (most recent call last):
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

For now, I patched the problem like this: https://github.com/h2oai/h2ogpt/blob/main/docs/trans.patch

With the patch all goes fine.

Thanks!

Expected behavior

If setting offline envs and are offline, it shouldn't fail if have all files just because of the safe tensors checks in the code.

I imagine there may be many other offline issues like this that need fixing in some general way.

younesbelkada commented 6 months ago

Hi @pseudotensor ! I think fixing this makes sense, what about adding the new except branch here: https://github.com/huggingface/transformers/blob/c876d12127272ce2886ce51399928c3343d475b9/src/transformers/utils/hub.py#L649 instead of patching it on modeling_utils so that we fix that for all methods / code paths that use has_file. What do you think?

pseudotensor commented 6 months ago

Probably fine, you would definitely know best.

younesbelkada commented 6 months ago

Perfect, would you like to contribute for that ? Otherwise happy to do it !

pseudotensor commented 5 months ago

Please do.