修改了模型加载路径,加载个人数据集
运行程序一直报OOM(个人配置 A10 24G)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 386.00 MiB. GPU 0 has a total capacty of 22.02 GiB of which 165.19 MiB is free. Including non-PyTorch memory, this process has 21.86 GiB memory in use. Of the allocated memory 21.13 GiB is allocated by PyTorch, and 458.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Epoch 0: 0%| | 0/64 [00:09<?, ?it/s]
bash train_lora_int4.sh -m train
train_pl.yaml配置文件
修改了模型加载路径,加载个人数据集 运行程序一直报OOM(个人配置 A10 24G) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 386.00 MiB. GPU 0 has a total capacty of 22.02 GiB of which 165.19 MiB is free. Including non-PyTorch memory, this process has 21.86 GiB memory in use. Of the allocated memory 21.13 GiB is allocated by PyTorch, and 458.98 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Epoch 0: 0%| | 0/64 [00:09<?, ?it/s]