huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.09k stars 27.03k forks source link

Saving model with shared tensors fails on cpu but succeeds on gpu #33688

Open kylesayrs opened 1 month ago

kylesayrs commented 1 month ago

System Info

platform: linux: ubuntu 22.04 python version: 3.10.12 transformers version: 4.44.2

Who can help?

No response

Information

Tasks

Reproduction

# example.py

import torch
import pytest
from transformers import AutoModelForCausalLM

@pytest.mark.parametrize(
    "torch_dtype,tie_word_embeddings,device_map",
    [
        (torch.float16, True,  "cpu"   ),  # passes
        (torch.float16, False, "cpu"   ),  # passes
        (torch.float32, True,  "cpu"   ),  # passes
        (torch.float32, False, "cpu"   ),  # fails
        (torch.float32, False, "cuda:0"),  # passes
    ],
)
def test_model_save(torch_dtype, tie_word_embeddings, device_map, tmp_path):
    model = AutoModelForCausalLM.from_pretrained(
        "Xenova/llama2.c-stories15M",
        torch_dtype=torch_dtype,
        tie_word_embeddings=tie_word_embeddings,
        device_map=device_map,
    )
    model.save_pretrained(tmp_path, safe_serialization=True)

    # test that the model saved correctly
    reloaded = AutoModelForCausalLM.from_pretrained(
        tmp_path,
        torch_dtype="auto",
        device_map=device_map
    )

    model_dict = model.state_dict()
    reloaded_dict = reloaded.state_dict()
    assert model_dict.keys() == reloaded_dict.keys()
    for key in model_dict:
        assert torch.equal(model_dict[key], reloaded_dict[key])
        assert model_dict[key].device == reloaded_dict[key].device
python3 -m pytest example.py
RuntimeError: 
  Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'lm_head.weight', 'model.embed_tokens.weight'}].
  A potential way to correctly save your model is to use `save_model`.
  More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Expected behavior

I expect save_pretrained to have the same behavior, regardless of model data type, and regardless of device

LysandreJik commented 1 month ago

Hmm indeed, this issue shouldn't pop up. @ydshieh, if you have the bandwidth, do you mind helping @kylesayrs out?

ydshieh commented 1 month ago

So far no clear idea yet, but when I tried with

"meta-llama/Llama-2-7b-hf"

all cases are passing.

Would it possible how you create the original Xenova/llama2.c-stories15M?

    "torch_dtype,tie_word_embeddings,device_map",
    [
        (torch.float16, True,  "cpu"   ),  
        (torch.float16, False, "cpu"   ),  
        (torch.float16, True,  "cuda:0"   ), 
        (torch.float16, False, "cuda:0"   ),  
        (torch.float32, True,  "cpu"   ),  
        (torch.float32, False, "cpu"   ),  
        (torch.float32, True, "cuda:0"),  
        (torch.float32, False, "cuda:0"), 
    ],
kylesayrs commented 1 month ago

@ydshieh I do not know the details of how the model was created, but from the config.json it seems like it was saved with tie_word_embeddings=True. Other models such as TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T do not have tied word embeddings and pass the test cases.

kylesayrs commented 1 month ago

Some other models which have the same tie_word_embeddings=True in the config such as "unsloth/Llama-3.2-3B-Instruct" pass, so it may be an issue with this stories model in particular.

This model is not atypical in any way that I know of, but I'll do some more investigation to see if I notice anything different

ydshieh commented 1 month ago

Oh, ok. I thought you are the author of llama2.c-stories15M 😅 sorry. I would try to see if I can figure it out next week.

ydshieh commented 1 month ago

(just for me)

import torch
from transformers import AutoModelForCausalLM

configs = [
    (torch.float32, False, "cpu"   ),  # fails
    #(torch.float16, True,  "cpu"   ),  # passes
    #(torch.float16, False, "cpu"   ),  # passes
    #(torch.float32, True,  "cpu"   ),  # passes
    #(torch.float32, False, "cpu"   ),  # fails
    #(torch.float32, False, "cuda:0"),  # passes
]

def test_model_save(torch_dtype, tie_word_embeddings, device_map, tmp_path="./"):
    model = AutoModelForCausalLM.from_pretrained(
        "Xenova/llama2.c-stories15M",
        torch_dtype=torch_dtype,
        tie_word_embeddings=tie_word_embeddings,
        device_map=device_map,
    )
    model.save_pretrained(tmp_path, safe_serialization=True)

for config in configs:
    test_model_save(*config)