vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Apache License 2.0
405 stars 29 forks source link

Pydantic Error #102

Closed CharlesRiggins closed 2 weeks ago

CharlesRiggins commented 3 weeks ago

Describe the bug Trying out the FP8 quantization example script in the vllm doc. But it failed.

Environment Include all relevant environment information: Ubuntu Python 3.10 LLM Compressor version 0.1.0 torch 2.4.0

To Reproduce

from llmcompressor.transformers import SparseAutoModelForCausalLM
from transformers import AutoTokenizer

# MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
MODEL_ID = "/mnt/public/open_source_model/Llama-2-7b-hf"

model = SparseAutoModelForCausalLM.from_pretrained(
  MODEL_ID, device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

from llmcompressor.transformers import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

# Configure the simple PTQ quantization
recipe = QuantizationModifier(
  targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])

# Apply the quantization algorithm.
oneshot(model=model, recipe=recipe)

# Save the model.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

Errors Traceback (most recent call last): File "/root/lingo-engine/foo.py", line 20, in oneshot(model=model, recipe=recipe) File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot main(model_args, data_args, training_args) File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 359, in main stage_runner.one_shot() File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/transformers/finetune/runner.py", line 194, in one_shot save_model_and_recipe( File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/pytorch/model_load/helpers.py", line 110, in save_model_and_recipe model.save_pretrained( File "/root/lingo-engine/.venv/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 103, in save_pretrained_wrapper compressor = ModelCompressor.from_pretrained_model( File "/root/lingo-engine/.venv/lib/python3.10/site-packages/compressed_tensors/compressors/model_compressor.py", line 154, in from_pretrained_model quantization_config = QuantizationConfig.from_pretrained( File "/root/lingo-engine/.venv/lib/python3.10/site-packages/compressed_tensors/quantization/quant_config.py", line 234, in from_pretrained return QuantizationConfig( File "/usr/local/lib/python3.10/dist-packages/pydantic/main.py", line 164, in init pydantic_self.pydantic_validator.validate_python(data, self_instance=__pydantic_self__) pydantic_core._pydantic_core.ValidationError: 1 validation error for QuantizationConfig format Input should be a valid string [type=string_type, inputvalue=<CompressionFormat.float...ized: 'float-quantized'>, input_type=CompressionFormat] For further information visit https://errors.pydantic.dev/2.4/v/string_type

CharlesRiggins commented 3 weeks ago

Not sure if it's related to the model I used. I didn't change anything except replacing the model path with a local path on my machine. I tried llama and qwen. Both failed.

robertgshaw2-neuralmagic commented 3 weeks ago

Can you post your compressed-tensors version?

rahul-tuli commented 3 weeks ago

Could you post your config.json as well please?

CharlesRiggins commented 3 weeks ago

Can you post your compressed-tensors version?

compressed-tensors version is 0.5.0

CharlesRiggins commented 3 weeks ago

Could you post your config.json as well please?

Sure. Here is the model config.

{
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-05,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.28.1",
  "use_cache": true,
  "vocab_size": 32000
}
CharlesRiggins commented 3 weeks ago

Has anyone reproduced this error?

rahul-tuli commented 3 weeks ago

We were unable to reproduce the error using the script provided. Could you please try rerunning the example in a fresh environment?

If the issue persists, kindly share the output of pip list | grep compress and help us by providing a minimal reproducible example.

CharlesRiggins commented 2 weeks ago

It probably is an environment-related issue. I tried it in a fresh and unconstrained environment, and it worked just fine. I'm still not sure which package in my production environment caused the issue. I will post it after I find it.

robertgshaw2-neuralmagic commented 2 weeks ago

It probably is an environment-related issue. I tried it in a fresh and unconstrained environment, and it worked just fine. I'm still not sure which package in my production environment caused the issue. I will post it after I find it.

Makes sense. Probably a pydantic conflict. Thanks for resolving it.

CharlesRiggins commented 2 weeks ago

It seems that the error was related to the version of pydantic. In my environment, it was 2.4.2 and I got the error. The error disappeared after I updated it to version 2.6.2.