unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
17.58k stars 1.22k forks source link

OOM during saving to GGUF after training #1164

Open Frank995 opened 1 week ago

Frank995 commented 1 week ago

Hi. I'm saving my model to GGUF after training. These are the utilization metrics:

155.3186 seconds used for training. 2.59 minutes used for training. Peak reserved memory = 7.939 GB. Peak reserved memory for training = 1.462 GB. Peak reserved memory % of max memory = 36.127 %. Peak reserved memory for training % of max memory = 6.653 %.

As you can see there's plenty of memory available. Nevertheless, when I run model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") it crashes for OOM:

File ~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:152, in _merge_lora(layer, name)
    [150](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:150) else:
    [151](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:151)     dtype = W.dtype
--> [152](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:152) W = W.to(torch.float32).t()
    [153](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:153) # W = W.t()
    [155](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:155) if A is not None:
    [156](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:156)     # sAB = (A.t().to(torch.float32) @ (s * B.t().to(torch.float32)))
    [157](https://vscode-remote+ssh-002dremote-002btraining-002dinstance.vscode-resource.vscode-cdn.net/home/ubuntu/training/~/miniconda3/lib/python3.12/site-packages/unsloth/save.py:157)     # W += sAB

OutOfMemoryError: CUDA out of memory. Tried to allocate 56.00 MiB. GPU 0 has a total capacity of 21.98 GiB of which 53.12 MiB is free. Process 1841 has 6.76 GiB memory in use. Including non-PyTorch memory, this process has 15.15 GiB memory in use. Of the allocated memory 14.74 GiB is allocated by PyTorch, and 107.45 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
fajjos commented 1 week ago

If your GPU is out of memory, consider moving the model to CPU for the saving step to offload GPU memory

fajjos commented 1 week ago

or try using a less aggressive quantization method..