huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
128.75k stars 25.54k forks source link

Unable to load starcoder2 finetuned version getting quantization errors #29990

Open h-sinha22 opened 3 months ago

h-sinha22 commented 3 months ago

System Info

I am running on A100 with 40 GB GPU memory

Who can help?

@SunMarc and @younesbelkada

Information

Tasks

Reproduction

1- I have a SFT tuned starcoder2 model 2- I am trying to load the same using AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path)

model = from_pretrained_wrapper(model_name_or_path,
functionexecutor-run-evaluation-d4381aa-3454115740: File “/app/code/evaluation/evaluation_utils.py”, line 189, in from_pretrained_wrapper
functionexecutor-run-evaluation-d4381aa-3454115740: AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path)
functionexecutor-run-evaluation-d4381aa-3454115740: File “/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py”, line 563, in from_pretrained
functionexecutor-run-evaluation-d4381aa-3454115740: return model_class.from_pretrained(
functionexecutor-run-evaluation-d4381aa-3454115740: File “/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py”, line 3039, in from_pretrained
functionexecutor-run-evaluation-d4381aa-3454115740: config.quantization_config = AutoHfQuantizer.merge_quantization_configs(
functionexecutor-run-evaluation-d4381aa-3454115740: File “/usr/local/lib/python3.8/dist-packages/transformers/quantizers/auto.py”, line 149, in merge_quantization_configs
functionexecutor-run-evaluation-d4381aa-3454115740: quantization_config = AutoQuantizationConfig.from_dict(quantization_config)
functionexecutor-run-evaluation-d4381aa-3454115740: File “/usr/local/lib/python3.8/dist-packages/transformers/quantizers/auto.py”, line 73, in from_dict
functionexecutor-run-evaluation-d4381aa-3454115740: raise ValueError(
functionexecutor-run-evaluation-d4381aa-3454115740: ValueError: Unknown quantization type, got bitsandbytes - supported types are: [‘awq’, ‘bitsandbytes_4bit’, ‘bitsandbytes_8bit’, ‘gptq’, ‘aqlm’, ‘quanto’]

Expected behavior

It should be able to load the model properly.

ArthurZucker commented 3 months ago

Could you share a full reproducer?

h-sinha22 commented 3 months ago

model config used:

{ "_name_or_path": "/app/mnt/models_cache/bigcode/starcoder2-7b", "activation_function": "gelu", "architectures": [ "Starcoder2ForCausalLM" ], "attention_dropout": 0.1, "attention_softmax_in_fp32": true, "bos_token_id": 0, "embedding_dropout": 0.1, "eos_token_id": 0, "hidden_act": "gelu_pytorch_tanh", "hidden_size": 4608, "initializer_range": 0.018042, "intermediate_size": 18432, "layer_norm_epsilon": 1e-05, "max_position_embeddings": 16384, "mlp_type": "default", "model_type": "starcoder2", "norm_epsilon": 1e-05, "norm_type": "layer_norm", "num_attention_heads": 36, "num_hidden_layers": 32, "num_key_value_heads": 4, "quantization_config": { "_load_in_4bit": false, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float32", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "fp4", "bnb_4bit_use_double_quant": false, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": false, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "residual_dropout": 0.1, "rope_theta": 1000000, "scale_attention_softmax_in_fp32": true, "scale_attn_weights": true, "sliding_window": 4096, "torch_dtype": "bfloat16", "transformers_version": "4.39.1", "use_bias": true, "use_cache": true, "vocab_size": 49152 }

ArthurZucker commented 3 months ago

That is not a full reproducer, we need the full code that you are running

luoruijie commented 2 months ago

I meet the same error ,

package version: bitsandbytes 0.43.1 transformers 4.40.0 torch 2.2.2+cu118 torchaudio 2.2.2+cu118 torchvision 0.17.2+cu118

My actions are as follows

First i use quantization code to quantize Chinese-Llama-2-7b to Chinese-Llama-2-7b-4bits. this is my quantization code:

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig,

model_id = "LinkSoul/Chinese-Llama-2-7b"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    quantization_config = BitsAndBytesConfig(
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    ),
    device_map='auto'
)

if __name__ == '__main__':

    import os
    output = "soulteary/Chinese-Llama-2-7b-4bit"
    if not os.path.exists(output):
        os.mkdir(output)

    model.save_pretrained(output)
    print("done")

**then i get the quantized model: soulteary/Chinese-Llama-2-7b-4bit

and i want use transformers to load the soulteary/Chinese-Llama-2-7b-4bit ,after i use next code。**

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer,BitsAndBytesConfig

model_id = 'soulteary/Chinese-Llama-2-7b-4bit'

if torch.cuda.is_available():

    quantization_config = BitsAndBytesConfig(
        bnb_4bit_quant_type="'bitsandbytes_4bit",  
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        quantization_config=quantization_config,
        local_files_only=True,
        torch_dtype=torch.float16,
        device_map='auto'
    )
else:
else:
    model = None

the erros appears:

Traceback (most recent call last): File "/home/soikit/LLM/app.py", line 6, in from model import run File "/home/soikit/LLM/model.py", line 15, in model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soikit/bj20_venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soikit/bj20_venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3155, in from_pretrained config.quantization_config = AutoHfQuantizer.merge_quantization_configs( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soikit/bj20_venv/lib/python3.11/site-packages/transformers/quantizers/auto.py", line 149, in merge_quantization_configs quantization_config = AutoQuantizationConfig.from_dict(quantization_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/soikit/bj20_venv/lib/python3.11/site-packages/transformers/quantizers/auto.py", line 73, in from_dict raise ValueError( ValueError: Unknown quantization type, got bitsandbytes - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto']

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

1049451037 commented 1 month ago

Similar issue... Not able to load after saving 4bit.

ValueError: Supplied state dict for model.layers.16.self_attn.vision_expert_dense.weight does not contain `bitsandbytes__*` and possibly other `quantized_stats` components.
ArthurZucker commented 1 month ago

cc @SunMarc and @younesbelkada

younesbelkada commented 1 month ago

Hi @1049451037 can you share a simple and short reproducible snippet? Can you also try with latest transformers pip install -U transformers

1049451037 commented 1 month ago
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    quant_method='nf4'
)
model = AutoModelForCausalLM.from_pretrained('THUDM/cogvlm2-llama3-chat-19B', quantization_config=quant_config)
tokenizer = AutoTokenizer.from_pretrained('THUDM/cogvlm2-llama3-chat-19B')

# save int4
model.save_pretrained('./cogvlm2-llama3-chat-19B-int4')
tokenizer.save_pretrained('./cogvlm2-llama3-chat-19B-int4')

# load failed
model = AutoModelForCausalLM.from_pretrained('./cogvlm2-llama3-chat-19B-int4', quantization_config=quant_config)
tokenizer = AutoTokenizer.from_pretrained('./cogvlm2-llama3-chat-19B-int4')
younesbelkada commented 1 month ago

On it !

amyeroberts commented 2 weeks ago

cc @SunMarc