unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.17k stars 1.27k forks source link

[Error] Some tensors share memory, this will lead to duplicate memory #1157

Open katopz opened 3 weeks ago

katopz commented 3 weeks ago

I try to upload a safetensor to hf without success for unsloth/Llama-3.2-3B-Instruct via an example

if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit_forced",)
if True: model.push_to_hub_merged("katopz/kbtg-kpoint-v2-4bit-safe", tokenizer, save_method = "merged_4bit_forced", token = token, safe_serialization = False)

get an error

Unsloth: Merging 4bit and LoRA weights to 4bit...
This might take 5 minutes...
Done.
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 10 minutes for Llama-7b...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-28-c42550036bf0>](https://localhost:8080/#) in <cell line: 15>()
     13 
     14 # Saving to safetensors, not bin format in Colab
---> 15 if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit_forced",)
     16 if True: model.push_to_hub_merged("katopz/kbtg-kpoint-v2-4bit-safe", tokenizer, save_method = "merged_4bit_forced", token = token, safe_serialization = False)
     17 

5 frames
[/usr/local/lib/python3.10/dist-packages/safetensors/torch.py](https://localhost:8080/#) in _flatten(tensors)
    486 
    487     if failing:
--> 488         raise RuntimeError(
    489             f"""
    490             Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: {failing}.

RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.embed_tokens.weight', 'lm_head.weight'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Not sure any work around on this? Thanks!

danielhanchen commented 3 weeks ago

@katopz Sorry on the delay - will investigate and get back to you!

ai-nikolai commented 1 week ago

@danielhanchen I am also encountering a similar probklem (with an even simpler way to preproduce):

    from unsloth import FastLanguageModel,  is_bfloat16_supported

    model, tokenizer = FastLanguageModel.from_pretrained(
                                    model_name="unsloth/Llama-3.2-1B-Instruct",
                                    max_seq_length=100,
                                    dtype=None,
                                    load_in_4bit=True,
                                    token = HF_TOKEN
                                )
    model.save_pretrained("./out_trained_models/unsloth_1B_llama", save_safetensors=False)
    tokenizer.save_pretrained("./out_trained_models/unsloth_1B_llama", save_safetensors=False)    

I get the following error:

RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.embed_tokens.weight', 'lm_head.weight'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

System Settings:

Driver Version: 535.183.01   CUDA Version: 12.2
unsloth @ git+https://github.com/unslothai/unsloth.git@1f52468fa31bf0b641ec96217ef0f5916a07fce5
safetensors==0.4.5
transformers==4.45.2
torch==2.4.1