Open MeimShang opened 2 years ago
Same problem. Have you found the solution?
I have same problem.if you work in windows,you can solve this problem by this method: modify sync_params function i guess the problem is that windows donot supports distributed training.
It's an error when I run image_train.py My torch version is 1.7.1+cu110 Traceback (most recent call last): File "scripts/image_train.py", line 83, in
main()
File "scripts/image_train.py", line 41, in main
TrainLoop(
File "/mnt/e/w2l/code/diffusion/ddim/improved-diffusion-main/improved_diffusion/train_util.py", line 78, in init
self._load_and_sync_parameters()
File "/mnt/e/w2l/code/diffusion/ddim/improved-diffusion-main/improved_diffusion/train_util.py", line 127, in _load_and_sync_parameters
dist_util.sync_params(self.model.parameters())
File "/mnt/e/w2l/code/diffusion/ddim/improved-diffusion-main/improved_diffusion/dist_util.py", line 72, in sync_params
dist.broadcast(p, 0)
File "/home/mayme/anaconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 868, in broadcast
work.wait()
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.