Open Frank995 opened 1 month ago
If your GPU is out of memory, consider moving the model to CPU for the saving step to offload GPU memory
or try using a less aggressive quantization method..
Hi. I encountered this too. I think this is an error that didn't exist before. It takes some time, but I was able to avoid it with the code below.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=path_name,
max_memory = {"cpu": "30GIB", 0: "5GIB"},
dtype=torch.bfloat16,
device_map="auto",
low_cpu_mem_usage=True,
)
model.save_pretrained_merged(merge_name, tokenizer, save_method="merged_16bit")
edit {"cpu": "30GIB", 0: "5GIB"} for you enviroment. 5GIB is your GPU memory, cpu is your system memory.
Hi. I'm saving my model to GGUF after training. These are the utilization metrics:
As you can see there's plenty of memory available. Nevertheless, when I run
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
it crashes for OOM: