Closed SongDongKuk closed 7 months ago
We recommend to use nn.parallel.DistributedDataParallel
which more efficiently manges memory.
(see link)
The evaluation code and script we provided are designed to utilize all available GPUs. Please ensure that you have correctly used the evaluation script (especially torchrun --nproc_per_node=auto
part) as your first step. If the issue persists with this script, consider manually setting the --nproc_per_node
to match the number of GPUs available.
nn.DataParallel을 사용해도 model이 특정 GPU에만 할당이 되서 CUDA out of memory를 내뱉고 있습니다 ㅠㅠ
방법이 있는지 확인좀 부탁드릴게요 !