Open fido20160817 opened 1 year ago
Same question
I got the solution from https://github.com/openai/guided-diffusion/issues/23. Just delete "if dist.get_rank() == 0" in train_util.py when loading ckpt with multi-GPUs
So the problem seems to be in version of the PyTorch in your notebook's configurations. From the looks of it, Colab and Jupyter notebooks use 0.4.0. So I added strict=False attribute to load_state_dict().
model.load_state_dict(checkpoint, strict=False)
Answer from https://stackoverflow.com/a/54058284
when load ckpt (multiGPU), it is stuck in load_state_dict () in the defined dist_util.py. But it is fine for one GPU. Anybody knows about this?