lucidrains / lightweight-gan

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two
MIT License
1.62k stars 220 forks source link

Source state dict is empty when using --amp #103

Open jeremy-rutman opened 2 years ago

jeremy-rutman commented 2 years ago

I hit a 'source state dict is empty' error when I try using --amp.

jeremy@jeremy-Blade:$ lightweight_gan --data data --aug-prob 0.25 --aug-types [translation,cutout,color]   --save-every 200 --name lw1  --amp --batch_size 16 --gradient_accumulate-every 4 
/home/jeremy/.local/lib/python3.8/site-packages/kornia/augmentation/augmentation.py:1830: DeprecationWarning: GaussianBlur is no longer maintained and will be removed from the future versions. Please use RandomGaussianBlur instead.
  warnings.warn(
continuing from previous epoch - 0
loading from version 0.20.4
/usr/lib/python3/dist-packages/apport/report.py:13: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import fnmatch, glob, traceback, errno, sys, atexit, locale, imp, stat
Traceback (most recent call last):
  File "/home/jeremy/.local/bin/lightweight_gan", line 8, in <module>
    sys.exit(main())
  File "/home/jeremy/.local/lib/python3.8/site-packages/lightweight_gan/cli.py", line 190, in main
    fire.Fire(train_from_folder)
  File "/home/jeremy/.local/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/jeremy/.local/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/jeremy/.local/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/jeremy/.local/lib/python3.8/site-packages/lightweight_gan/cli.py", line 181, in train_from_folder
    run_training(0, 1, model_args, data, load_from, new, num_train_steps, name, seed)
  File "/home/jeremy/.local/lib/python3.8/site-packages/lightweight_gan/cli.py", line 59, in run_training
    model.load(load_from)
  File "/home/jeremy/.local/lib/python3.8/site-packages/lightweight_gan/lightweight_gan.py", line 1474, in load
    self.G_scaler.load_state_dict(load_data['G_scaler'])
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 500, in load_state_dict
    raise RuntimeError("The source state dict is empty, possibly because it was saved "
RuntimeError: The source state dict is empty, possibly because it was saved from a disabled instance of GradScaler.

I'll see if I can dig in a bit and root it out

msglm commented 2 years ago

I ran into this error when using --amp on an already started project. I changed the --name parameter to something different, using the same data of course, and the problem went away. It might be worth it to start another named project if you're trying to use --amp.