Runtime Error : CUDA OUT OF MEMORY

I am trying to fintune the 7b-instruct-gptq model but it gives me cuda out of memory when I specify a cutoff length of 2048.

Parameters: -------config------- dataset='./Falcontune_data.json' data_type='alpaca' lora_out_dir='./falcon-7b-instruct-4bit-customodel/' lora_apply_dir=None weights='gptq_model-4bit-64g.safetensors' target_modules=['query_key_value']

------training------ mbatch_size=1 batch_size=2 gradient_accumulation_steps=2 epochs=3 lr=0.0003 cutoff_len=2048 lora_r=8 lora_alpha=16 lora_dropout=0.05 val_set_size=0.2 gradient_checkpointing=False gradient_checkpointing_ratio=1 warmup_steps=5 save_steps=50 save_total_limit=3 logging_steps=5 checkpoint=False skip=False world_size=1 ddp=False device_map='auto'

OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 14.75 GiB total capacity; 12.29 GiB already allocated; 476.81 MiB free; 13.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

rmihaylov / falcontune

Runtime Error : CUDA OUT OF MEMORY #31