Hi. I'm saving my model to GGUF after training. These are the utilization metrics:
155.3186 seconds used for training.
2.59 minutes used for training.
Peak reserved memory = 7.939 GB.
Peak reserved memory for training = 1.462 GB.
Peak reserved memory % of max memory = 36.127 %.
Peak reserved memory for training % of max memory = 6.653 %.
As you can see there's plenty of memory available. Nevertheless, when I run model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") it crashes for OOM:
File ~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:152, in _merge_lora(layer, name)
[150](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:150) else:
[151](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:151) dtype = W.dtype
--> [152](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:152) W = W.to(torch.float32).t()
[153](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:153) # W = W.t()
[155](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:155) if A is not None:
[156](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:156) # sAB = (A.t().to(torch.float32) @ (s * B.t().to(torch.float32)))
[157](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:157) # W += sAB
OutOfMemoryError: CUDA out of memory. Tried to allocate 56.00 MiB. GPU 0 has a total capacity of 21.98 GiB of which 53.12 MiB is free. Process 1841 has 6.76 GiB memory in use. Including non-PyTorch memory, this process has 15.15 GiB memory in use. Of the allocated memory 14.74 GiB is allocated by PyTorch, and 107.45 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Hi. I'm saving my model to GGUF after training. These are the utilization metrics:
As you can see there's plenty of memory available. Nevertheless, when I run
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
it crashes for OOM: