Fail to load model without .safetensors file

wygao8 commented 2 months ago

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.41.2
Platform: Linux-5.4.0-146-generic-x86_64-with-glibc2.31
Python version: 3.11.9
Huggingface_hub version: 0.23.4
Safetensors version: 0.4.3
Accelerate version: 0.27.2
Accelerate config: not found
PyTorch version (GPU?): 2.3.1+cu118 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

Hi I used huggingface-cli to download both haoranxu/ALMA-13B-R and ALMA-13B in the same cache directory (my_cache_dir).

I can load ALMA-13B-R successfully with the following command:

model = AutoModelForCausalLM.from_pretrained(
    "haoranxu/ALMA-13B-R", 
    cache_dir=my_cache_dir, 
    torch_dtype=torch.float16, 
    device_map="auto")

but failed to load ALMA-13B

model = AutoModelForCausalLM.from_pretrained(
    "haoranxu/ALMA-13B", 
    cache_dir=my_cache_dir, 
    torch_dtype=torch.float16, 
    device_map="auto")

The error log show as follows:

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with 
url: /haoranxu/ALMA-13B/resolve/main/model.safetensors.index.json (Caused by ConnectTimeoutError(<urllib3.connecti
on.HTTPSConnection object at 0x7f4ef4913290>, 'Connection to huggingface.co timed out. (connect timeout=10)'))    

During handling of the above exception, another exception occurred:     

requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded wit
h url: /haoranxu/ALMA-13B/resolve/main/model.safetensors.index.json (Caused by ConnectTimeoutError(<urllib3.connec
tion.HTTPSConnection object at 0x7f4ef4913290>, 'Connection to huggingface.co timed out. (connect timeout=10)'))

After downgrading transformers to 4.39.3, ALMA-13B can be loaded with the same command.

Since ALMA-13B-R has .safetensors files whereas ALMA-13B only has pytorch.bin files, I believe some bugs still need to be fixed.

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

import torch from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("haoranxu/ALMA-13B", cache_dir=your_cache_dir, torch_dtype=torch.float16, device_map="auto")

Expected behavior

The model is successfully loaded. The command-line log probably show as follows:

Loading checkpoint shards: 100%|███████████████████████████████| 6/6 [00:31<00:00,  5.24s/it]

amyeroberts commented 2 months ago

Hi @wygao8, thanks for opening this issue!

I'm able to run the following without issue on main and v4.41.2:

import os
import torch

os.environ['HF_HUB_OFFLINE'] = '1'

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "haoranxu/ALMA-13B",
    torch_dtype=torch.float16,
    device_map="auto",
    local_files_only=True
)

Interestingly, I can't run the same with "haoranxu/ALMA-13B-R" in offline mode, it's unable to load the adapter weights locally (it always tries to load from the hub)

amyeroberts commented 2 months ago

I've opened up #31700 for the offline mode issue if there's adapter weights

github-actions[bot] commented 11 hours ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers