huggingface / transformers

πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.97k stars 26.29k forks source link

Fail to load model without .safetensors file #31552

Open wygao8 opened 2 months ago

wygao8 commented 2 months ago

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

Who can help?

Hi I used huggingface-cli to download both haoranxu/ALMA-13B-R and ALMA-13B in the same cache directory (my_cache_dir).

I can load ALMA-13B-R successfully with the following command:

model = AutoModelForCausalLM.from_pretrained(
    "haoranxu/ALMA-13B-R", 
    cache_dir=my_cache_dir, 
    torch_dtype=torch.float16, 
    device_map="auto")

but failed to load ALMA-13B

model = AutoModelForCausalLM.from_pretrained(
    "haoranxu/ALMA-13B", 
    cache_dir=my_cache_dir, 
    torch_dtype=torch.float16, 
    device_map="auto")

The error log show as follows:

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with 
url: /haoranxu/ALMA-13B/resolve/main/model.safetensors.index.json (Caused by ConnectTimeoutError(<urllib3.connecti
on.HTTPSConnection object at 0x7f4ef4913290>, 'Connection to huggingface.co timed out. (connect timeout=10)'))    

During handling of the above exception, another exception occurred:     

requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded wit
h url: /haoranxu/ALMA-13B/resolve/main/model.safetensors.index.json (Caused by ConnectTimeoutError(<urllib3.connec
tion.HTTPSConnection object at 0x7f4ef4913290>, 'Connection to huggingface.co timed out. (connect timeout=10)'))  

After downgrading transformers to 4.39.3, ALMA-13B can be loaded with the same command.

Since ALMA-13B-R has .safetensors files whereas ALMA-13B only has pytorch.bin files, I believe some bugs still need to be fixed.

Information

Tasks

Reproduction

import torch from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("haoranxu/ALMA-13B", cache_dir=your_cache_dir, torch_dtype=torch.float16, device_map="auto")

Expected behavior

The model is successfully loaded. The command-line log probably show as follows:

Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [00:31<00:00,  5.24s/it]
amyeroberts commented 2 months ago

Hi @wygao8, thanks for opening this issue!

I'm able to run the following without issue on main and v4.41.2:

import os
import torch

os.environ['HF_HUB_OFFLINE'] = '1'

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "haoranxu/ALMA-13B",
    torch_dtype=torch.float16,
    device_map="auto",
    local_files_only=True
)

Interestingly, I can't run the same with "haoranxu/ALMA-13B-R" in offline mode, it's unable to load the adapter weights locally (it always tries to load from the hub)

amyeroberts commented 2 months ago

I've opened up #31700 for the offline mode issue if there's adapter weights

github-actions[bot] commented 11 hours ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.