Open danielstankw opened 1 month ago
try this:
deepspeed_config = {
"train_batch_size": 4,
"gradient_accumulation_steps": 4,
"zero_optimization": {
"stage": 4
},
"fp16": {
"enabled": True
}
}
accelerator = Accelerator(mixed_precision='fp16', deepspeed_plugin=deepspeed_plugin)
model = AutoAWQForCausalLM.from_pretrained(output_model_path,torch_dtype=torch.float16.,device_map="auto")
model = accelerator.prepare(model)
model.quantize(tokenizer, quant_config=quant_config)
if accelerator.is_main_process:
model.save_quantized("./"+quant_path, safetensors=True)
tokenizer.save_pretrained("./"+quant_path)
Your current environment
...
How would you like to use vllm
I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ. Whenever I run the script below I get 0% GPU utilization. Can anyone assist why can this be happening?