Closed CharlesRiggins closed 2 weeks ago
Not sure if it's related to the model I used. I didn't change anything except replacing the model path with a local path on my machine. I tried llama and qwen. Both failed.
Can you post your compressed-tensors
version?
Could you post your config.json
as well please?
Can you post your
compressed-tensors
version?
compressed-tensors version is 0.5.0
Could you post your
config.json
as well please?
Sure. Here is the model config.
{
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"pad_token_id": 0,
"rms_norm_eps": 1e-05,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.28.1",
"use_cache": true,
"vocab_size": 32000
}
Has anyone reproduced this error?
We were unable to reproduce the error using the script provided. Could you please try rerunning the example in a fresh environment?
If the issue persists, kindly share the output of pip list | grep compress
and help us by providing a minimal reproducible example.
It probably is an environment-related issue. I tried it in a fresh and unconstrained environment, and it worked just fine. I'm still not sure which package in my production environment caused the issue. I will post it after I find it.
It probably is an environment-related issue. I tried it in a fresh and unconstrained environment, and it worked just fine. I'm still not sure which package in my production environment caused the issue. I will post it after I find it.
Makes sense. Probably a pydantic conflict. Thanks for resolving it.
It seems that the error was related to the version of pydantic. In my environment, it was 2.4.2 and I got the error. The error disappeared after I updated it to version 2.6.2.
Describe the bug Trying out the FP8 quantization example script in the vllm doc. But it failed.
Environment Include all relevant environment information: Ubuntu Python 3.10 LLM Compressor version 0.1.0 torch 2.4.0
To Reproduce
Errors Traceback (most recent call last): File "/root/lingo-engine/foo.py", line 20, in
oneshot(model=model, recipe=recipe)
File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
main(model_args, data_args, training_args)
File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 359, in main
stage_runner.one_shot()
File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/transformers/finetune/runner.py", line 194, in one_shot
save_model_and_recipe(
File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/pytorch/model_load/helpers.py", line 110, in save_model_and_recipe
model.save_pretrained(
File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 103, in save_pretrained_wrapper
compressor = ModelCompressor.from_pretrained_model(
File "/root/lingo-engine/.venv/lib/python3.10/site-packages/compressed_tensors/compressors/model_compressor.py", line 154, in from_pretrained_model
quantization_config = QuantizationConfig.from_pretrained(
File "/root/lingo-engine/.venv/lib/python3.10/site-packages/compressed_tensors/quantization/quant_config.py", line 234, in from_pretrained
return QuantizationConfig(
File "/usr/local/lib/python3.10/dist-packages/pydantic/main.py", line 164, in init
pydantic_self.pydantic_validator.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for QuantizationConfig
format
Input should be a valid string [type=string_type, inputvalue=<CompressionFormat.float...ized: 'float-quantized'>, input_type=CompressionFormat]
For further information visit https://errors.pydantic.dev/2.4/v/string_type