Closed dialuser closed 1 year ago
I'm using batch_size=8, is it too many?
For multi-gpu issue, please try out rm -rf .torch_distributed_init
and re-run the code with multi-gpu.
Yes, it seems the batch size is too large; could you use smaller batch size?
When I tried to run first_stage, the code hangs at torch.multiprocessing.spawn(fn=first_stage, args=(args, ), nprocs=args.n_gpus)
After I changed to single GPU, the code ran, however, I kept getting memory error even after reducing channels from 384 to 48. In the paper, it says the model can fit on a single card.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 23.70 GiB total capacity; 20.26 GiB already allocated; 1.34 GiB free; 21.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Here's my model configuration,