unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
17.11k stars 1.17k forks source link

Download and saving unsloth/gemma-7b-bnb-4bit to a local folder loses parameters #333

Open patrickjchen opened 6 months ago

patrickjchen commented 6 months ago

First load model with internet connection ON model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/gemma-7b-bnb-4bit", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) Then I saved the model, using local_path = "***" mode.save_pretrained(local_path) tokenizer.save_pretrained(local_path)

Then make a dataset, then try to use the model locally with internet connection OFF model, tokenizer = FastLanguageModel.from_pretrained( model_name = "***", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) ​And got the following error: ValueError: Supplied state dict for model.layers.23.mlp.down_proj.weight does not contain `bitsandbytes__and possibly otherquantized_stats` components.

patrickjchen commented 6 months ago

Is this the correct model after downloading from Huggingface?

model GemmaForCausalLM( (model): GemmaModel( (embed_tokens): Embedding(256000, 3072) (layers): ModuleList( (0-27): 28 x GemmaDecoderLayer( (self_attn): GemmaSdpaAttention( (q_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (k_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (v_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (o_proj): Linear4bit(in_features=4096, out_features=3072, bias=False) (rotary_emb): GemmaFixedRotaryEmbedding() ) (mlp): GemmaMLP( (gate_proj): Linear4bit(in_features=3072, out_features=24576, bias=False) (up_proj): Linear4bit(in_features=3072, out_features=24576, bias=False) (down_proj): Linear4bit(in_features=24576, out_features=3072, bias=False) (act_fn): PytorchGELUTanh() ) (input_layernorm): GemmaRMSNorm() (post_attention_layernorm): GemmaRMSNorm() ) ) (norm): GemmaRMSNorm() ) (lm_head): Linear(in_features=3072, out_features=256000, bias=False) )

down_proj = model.model.layers[0].mlp.down_proj print(down_proj) Linear4bit(in_features=24576, out_features=3072, bias=False)

danielhanchen commented 6 months ago

@patrickjchen So ur using the Kaggle notebook here https://www.kaggle.com/code/danielhanchen/kaggle-gemma-7b-unsloth-notebook/ right?

I'm uncertain on internet connections and stuff sadly - not a Kaggle expert :(

patrickjchen commented 6 months ago

Dan, seems the implementation of save_pretrained/from_pretrained have some issues for the Gemma 7b model. My code wroks for Mistral. However, I felt Gemma version is a lot better.

patrickjchen commented 6 months ago

and from the code: python3.11/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 193: if (param_name + ".quant_state.bitsandbytes__fp4" not in state_dict) and ( param_name + ".quant_state.bitsandbytesnf4" not in state_dict ): raise ValueError( f"Supplied state dict for {param_name} does not contain `bitsandbytes*and possibly otherquantized_stats` components." )

Seems it implies that there shall be names like bitsandbytesnf4/bitsandbytesfp4 But the names in the unsloth code are cdequantize_blockwise_fp16_nf4/cdequantize_blockwise_bf16_nf4

patrickjchen commented 6 months ago

Seems some keys are lost after reading back (from_pretrained()). There were 1234 keys, but after store/retrieve, there were only 1050 keys. For the top level state_dict

patrickjchen commented 6 months ago

@danielhanchen Hi Dan, after I did model, tokenizer = FastLanguageModel.from_pretrained( model_name = "****", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) How to save the model to a local folder? I tried model.save_pretrained(), model.save_pretrained_merged() and unsloth_save_model(), none of them work. And also the model above is just GemmaForCausalLM, not PeftModelForCausalLM. And my observation is that saving model is lossing parameters (1234--->1050)

danielhanchen commented 6 months ago

@patrickjchen Ok Ill take a look

alarecha24 commented 3 months ago

Once I get the model from the internet and save it locally, I change my config.json to have the local model path and then when I try to reload it from local memory I get the following error even though I have bitsandbytes (version 0.43.1) installed. Is there a workaround for this?: "You have a version of bitsandbytes that is not compatible with 4bit inference and training"

ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

danielhanchen commented 3 months ago

@alarecha24 That's a weird error msg - is this for Gemma? What's your GPU?

nick-gt commented 3 months ago

I'm attempting to fine tune a locally saved model and running into the same issue. My GPU info is below: NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 Tesla T4

danielhanchen commented 3 months ago

@nick-gt Are you using load_in_4bit = True?

nick-gt commented 3 months ago

@danielhanchen yes, I've tried using both true and false.

jgarcia2809 commented 2 months ago

Hello everyone, I am running in the same issue :

"ValueError: Supplied state dict for model.layers.28.mlp.gate_proj.weight does not contain bitsandbytes__* and possibly other quantized_stats components."

I am trying to finetune the model "unsloth/codellama-13b-bnb-4bit" using the FastLanguageModel.from_pretrained() method, but it does not even seem to even pull the model since the error happens right after the shards are starting to be downloaded.

GPU : NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 Torch : 2.3.0

Here is the code snippet :

max_seq_length = 2048 
dtype = None 
load_in_4bit = True 

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name, 
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,  

)

Thanks in advance for the help!

danielhanchen commented 2 months ago

@jgarcia2809 I just reuploaded Codellama-13b - hopefully it works now

jgarcia2809 commented 2 months ago

Hi @danielhanchen thank you for the quick response. Unfortunately it still gives me the same error.

I tried with the "unsloth/llama-3-8b-Instruct-bnb-4bit" and could finetune it using the same code. It did not produced any errors.

danielhanchen commented 2 months ago

Ok that's very weird - I'll see what I can do - temporarily best to use unsloth/llama-3.1-8b - another approach is to uninstall unsloth then reinstall it