Closed anmarques closed 2 weeks ago
model = SparseAutoModelForCausalLM.from_pretrained( model_id, device_map="auto", )
It seems that you did not pass the correct device_map here.
device_map = custom_offload_device_map(
model_id,
max_memory_per_gpu=max_memory_per_gpu,
num_gpus=num_gpus,
torch_dtype="auto",
)
model = SparseAutoModelForCausalLM.from_pretrained(
model_id,
device_map=device_map,
)
Closing this out due to lack of activity. Please reopen if you are still hitting the issue!
Describe the bug When using a SmoothQuantModifier and cpu offloading there is a conflict of tensors not being on the right device.
Expected behavior cpu offloading should work w/ SmoothQuant 😄
Environment I don't think environment is relevant.
To Reproduce
Errors
Additional context I met this issue when quantizing Meta-Llama-3.1-405B-Instruct but I'm sure there's no need to use such a large model to reproduce it.