How to make it work on two devices?

Problem raised:

Module: OpenFabrics (openib) Host: gpusystem

Another transport will be used instead, although this may result in lower performance.

NOTE: You can disable this warning by setting the MCA parameter btl_base_warn_component_unused to 0.

Logging to logs/monuseg/prop1.0_pretrain Program executed via: scripts/image_syn/train.py \ --data_dir monuseg/allpatch256x256_128/train \ --viz_data_dir monuseg/allpatch256x256_128/train \ --lr 1e-4 \ --batch_size 1 \ --attention_resolutions 32,16,8 \ --diffusion_steps 1000 \ --image_size 512 \ --learn_sigma True \ --noise_schedule linear \ --num_channels 256 \ --num_head_channels 64 \ --num_res_blocks 2 \ --resblock_updown True \ --use_fp16 True \ --use_scale_shift_norm True \ --use_checkpoint False \ --num_classes 2 \ --class_cond True \ --no_instance False \ --max_iterations 15 \ --save_interval 10000 \ --viz_interval 10000 \ --viz_batch_size 2

creating unet model and diffusion... 32,16,8 32,16,8 [16, 32, 64] [] [16, 32, 64] Let's use 2 GPUs! Unet model size: 2521.918MB creating data loader... Viz data len: 784 training... Number of samples: 784 Diffusion loss: 1.0043 Traceback (most recent call last): File "scripts/image_syn/train.py", line 140, in main() File "scripts/image_syn/train.py", line 67, in main CycleTrainLoop( File "/sda/hyunbin20240205/Nudiff-main/nudiff/image_syn/src/run_desc.py", line 118, in run_loop self.run_step(batch, cond).to("cuda:6") File "/sda/hyunbin20240205/Nudiff-main/nudiff/image_syn/src/run_desc.py", line 103, in run_step took_step = self.mp_trainer.optimize(self.opt) File "/sda/hyunbin20240205/Nudiff-main/nudiff/image_syn/utils/fp16_util.py", line 190, in optimize return self._optimize_fp16(opt) File "/sda/hyunbin20240205/Nudiff-main/nudiff/image_syn/utils/fp16_util.py", line 208, in _optimize_fp16 opt.step() File "/home/edsr/anaconda3/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper return wrapped(*args, kwargs) File "/home/edsr/anaconda3/lib/python3.8/site-packages/torch/optim/optimizer.py", line 140, in wrapper out = func(*args, *kwargs) File "/home/edsr/anaconda3/lib/python3.8/site-packages/torch/optim/optimizer.py", line 23, in _use_grad ret = func(self, args, kwargs) File "/home/edsr/anaconda3/lib/python3.8/site-packages/torch/optim/adam.py", line 234, in step adam(params_with_grad, File "/home/edsr/anaconda3/lib/python3.8/site-packages/torch/optim/adam.py", line 300, in adam func(params, File "/home/edsr/anaconda3/lib/python3.8/site-packages/torch/optim/adam.py", line 410, in _single_tensor_adam denom = (exp_avg_sq.sqrt() / bias_correction2sqrt).add(eps) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.46 GiB (GPU 0; 23.68 GiB total capacity; 19.95 GiB already allocated; 1.62 GiB free; 20.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am trying to make it work on my server, having 6 gpus. I have tried few ways to see the dist_util file but it did not work so well. Can someone help me out please?

Thank you

xinyiyu / Nudiff

How to make it work on two devices? #3

NOTE: You can disable this warning by setting the MCA parameter btl_base_warn_component_unused to 0.