load_state_dict stuck! - Githubissues

openai / guided-diffusion

MIT License

6.03k stars 803 forks source link

load_state_dict stuck! #68

Open fido20160817 opened 1 year ago

fido20160817 commented 1 year ago

when load ckpt (multiGPU), it is stuck in load_state_dict () in the defined dist_util.py. But it is fine for one GPU. Anybody knows about this?

pmj110119 commented 1 year ago

Same question

Suimingzhe commented 1 year ago

I got the solution from https://github.com/openai/guided-diffusion/issues/23. Just delete "if dist.get_rank() == 0" in train_util.py when loading ckpt with multi-GPUs

randomrushgirl commented 1 year ago

So the problem seems to be in version of the PyTorch in your notebook's configurations. From the looks of it, Colab and Jupyter notebooks use 0.4.0. So I added strict=False attribute to load_state_dict(). model.load_state_dict(checkpoint, strict=False) Answer from https://stackoverflow.com/a/54058284