[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU

Your current environment

...

How would you like to use vllm

I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ. Whenever I run the script below I get 0% GPU utilization. Can anyone assist why can this be happening?

import json
from huggingface_hub import snapshot_download
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import os

# some other code here
# ////////////////
# some code here

# Load model
model = AutoAWQForCausalLM.from_pretrained(args.model_path, device_map="auto", **{"low_cpu_mem_usage": True})
tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)

# Load quantization config from file
if args.quant_config:
    quant_config = json.loads(args.config)
else:
    # Default quantization config
    print("Using default quantization config")
    quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}

# Quantize
print("Quantizing the model")
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model and tokenizer
if args.quant_path:
    print("Saving the model")
    model.save_quantized(args.quant_path)
    tokenizer.save_pretrained(args.quant_path)
else:
    print("No quantized model path provided, not saving quantized model.")

vllm-project / vllm

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU #4744

Your current environment

How would you like to use vllm