running through CUDA OutOfMemory error

I'm always getting cuda OutOfMemory error :

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.50 GiB. GPU 0 has a total capacity of 21.99 GiB of which 6.62 GiB is free. Including non-PyTorch memory, this process has 15.35 GiB memory in use. Of the allocated memory 6.15 GiB is allocated by PyTorch, and 8.71 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management ( https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I already reduced batch size and I placed torch.cuda.empty_cache() everywhere in my script, but still not enough...

### I'm using :

pip list | grep cuda

nvidia-cuda-cupti-cu11 11.8.87 nvidia-cuda-nvrtc-cu11 11.8.89 nvidia-cuda-runtime-cu11 11.8.89

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0

pip list | grep torch

pytorch-lightning 2.1.2 pytorch-triton 3.0.0+989adb9a29 torch 2.2.1+cu118 torchaudio 2.2.1+cu118 torchmetrics 1.3.2 torchvision 0.17.1+cu118

tensorflow / tpu

running through CUDA OutOfMemory error #1058