change:
configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py
just change 10 iter print log, and 50 iter to save checkpoint, no val and no test
first:
RUN python tools/train.py configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py,
when 110iter stop.
then:
RUN python tools/train.py configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py --resume
Reproduces the problem - error message
08/21 10:03:10 - mmengine - INFO - Working directory: ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds
08/21 10:03:10 - mmengine - INFO - Log directory: /test/mmagic/work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds/20230821_100254
08/21 10:03:19 - mmengine - INFO - Add to optimizer 'generator' ({'type': 'Adam', 'lr': 0.0001, 'betas': (0.9, 0.99)}): 'generator'.
08/21 10:03:24 - mmengine - INFO - Auto resumed from the latest checkpoint ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds/iter_100.pth.
Loads checkpoint by local backend from path: ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds/iter_100.pth
08/21 10:03:24 - mmengine - INFO - Load checkpoint from ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds/iter_100.pth
08/21 10:03:24 - mmengine - INFO - resumed epoch: 0, iter: 100
08/21 10:03:24 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
08/21 10:03:24 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
08/21 10:03:24 - mmengine - INFO - Checkpoints will be saved to ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.
================== parsed_losses_g: tensor(0.0444, device='cuda:0', grad_fn=)
Traceback (most recent call last):
File "tools/train.py", line 114, in
main()
File "tools/train.py", line 107, in main
runner.train()
File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1746, in train
model = self.train_loop.run() # type: ignore
File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/loops.py", line 278, in run
self.run_iter(data_batch)
File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/loops.py", line 301, in run_iter
outputs = self.runner.model.train_step(
File "/test/mmagic/mmagic/models/editors/real_basicvsr/real_basicvsr.py", line 169, in train_step
log_vars_d = self.g_step_with_optim(
File "/test/mmagic/mmagic/models/editors/srgan/srgan.py", line 213, in g_step_with_optim
g_optim_wrapper.update_params(parsed_losses_g)
File "/opt/conda/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 205, in update_params
self.step(step_kwargs)
File "/opt/conda/lib/python3.8/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 115, in wrapper
return wrapped(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 257, in step
self.optimizer.step(kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py", line 109, in wrapper
return func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/optim/adam.py", line 157, in step
adam(params_with_grad,
File "/opt/conda/lib/python3.8/site-packages/torch/optim/adam.py", line 213, in adam
func(params,
File "/opt/conda/lib/python3.8/site-packages/torch/optim/adam.py", line 255, in _single_tensor_adam
assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."
AssertionError: If capturable=False, state_steps should not be CUDA tensors.
Prerequisite
Task
I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmagic
Environment
sys.platform: linux Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.1, V11.1.105 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.12.0+cu113 PyTorch compiling details: PyTorch built with:
TorchVision: 0.13.0+cu113 OpenCV: 4.5.1 MMEngine: 0.8.4 MMCV: 2.0.1 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.3 MMagic: 1.0.2dev0+unknown
Reproduces the problem - code sample
None
Reproduces the problem - command or script
change: configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py just change 10 iter print log, and 50 iter to save checkpoint, no val and no test
first: RUN python tools/train.py configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py, when 110iter stop. then: RUN python tools/train.py configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py --resume
Reproduces the problem - error message
08/21 10:03:10 - mmengine - INFO - Working directory: ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds 08/21 10:03:10 - mmengine - INFO - Log directory: /test/mmagic/work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds/20230821_100254 08/21 10:03:19 - mmengine - INFO - Add to optimizer 'generator' ({'type': 'Adam', 'lr': 0.0001, 'betas': (0.9, 0.99)}): 'generator'. 08/21 10:03:24 - mmengine - INFO - Auto resumed from the latest checkpoint ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds/iter_100.pth. Loads checkpoint by local backend from path: ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds/iter_100.pth 08/21 10:03:24 - mmengine - INFO - Load checkpoint from ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds/iter_100.pth 08/21 10:03:24 - mmengine - INFO - resumed epoch: 0, iter: 100 08/21 10:03:24 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io 08/21 10:03:24 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future. 08/21 10:03:24 - mmengine - INFO - Checkpoints will be saved to ./work_dirs/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds. ================== parsed_losses_g: tensor(0.0444, device='cuda:0', grad_fn=)
Traceback (most recent call last):
File "tools/train.py", line 114, in
main()
File "tools/train.py", line 107, in main
runner.train()
File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1746, in train
model = self.train_loop.run() # type: ignore
File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/loops.py", line 278, in run
self.run_iter(data_batch)
File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/loops.py", line 301, in run_iter
outputs = self.runner.model.train_step(
File "/test/mmagic/mmagic/models/editors/real_basicvsr/real_basicvsr.py", line 169, in train_step
log_vars_d = self.g_step_with_optim(
File "/test/mmagic/mmagic/models/editors/srgan/srgan.py", line 213, in g_step_with_optim
g_optim_wrapper.update_params(parsed_losses_g)
File "/opt/conda/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 205, in update_params
self.step(step_kwargs)
File "/opt/conda/lib/python3.8/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 115, in wrapper
return wrapped(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 257, in step
self.optimizer.step(kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py", line 109, in wrapper
return func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/optim/adam.py", line 157, in step
adam(params_with_grad,
File "/opt/conda/lib/python3.8/site-packages/torch/optim/adam.py", line 213, in adam
func(params,
File "/opt/conda/lib/python3.8/site-packages/torch/optim/adam.py", line 255, in _single_tensor_adam
assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."
AssertionError: If capturable=False, state_steps should not be CUDA tensors.
Additional information
No response