您好,我用v100训练sft和rm时都说显存不够无法运行,具体报错信息如下:
OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 1; 31.75
GiB total capacity; 29.88 GiB already allocated; 11.75 MiB free; 29.98 GiB
reserved in total by PyTorch) If reserved memory is >> allocated memory try
setting max_split_size_mb to avoid fragmentation. See documentation for Memory
您好,我用v100训练sft和rm时都说显存不够无法运行,具体报错信息如下: OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 1; 31.75 GiB total capacity; 29.88 GiB already allocated; 11.75 MiB free; 29.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory
我已经将per_device_train_batch_size和per_device_eval_batch_size调到1了,但仍然提示说显存不够,请问有什么办法解这个问题吗?