rmihaylov / falcontune

Tune any FALCON in 4-bit
Apache License 2.0
468 stars 51 forks source link

OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 79.35 GiB total capacity; 77.18 GiB already allocated; 57.19 MiB free; 77.97 GiB reserved in total by PyTorch) #19

Open gpravi opened 1 year ago

gpravi commented 1 year ago

Ran into CUDA OOM issue during fine tuning

File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 336, in _save_to_state_dict self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices) File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 96, in undo_layout outputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda torch.cuda.OutOfMemoryError: CUDA out of memory.

Any ideas to fix this?

SoumitriKolavennu commented 1 year ago

This happened to me as well with the 40B model. This error only occurs when trying to save a checkpoint. Tried to save the model after all steps instead of every 50 steps - still got the error.

gpravi commented 1 year ago

Any luck?

angelovAlex commented 1 year ago

I am not sure if it is the same issue I had before, but please check this https://github.com/rhulha/lora/issues/1

gpravi commented 1 year ago

Thanks @angelovAlex . Applied the patch in that issue. Now running into a similar issue at a different line

File "/root/.conda/envs/falcontune/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1815, in state_dict self._save_to_state_dict(destination, prefix, keep_vars) File "/root/.conda/envs/falcontune/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 330, in _save_to_state_dict weight_clone = self.weight.data.clone() ^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 79.35 GiB total capacity; 75.73 GiB already allocated; 127.19 MiB free; 77.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF