zsyOAOA / ResShift

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight, TPAMI@2024)
Other
955 stars 50 forks source link

Training fails #71

Open PavelBartenev opened 6 months ago

PavelBartenev commented 6 months ago

Dear Author,

I am trying to launch the training as ReadMe recommends:

CUDA_VISIBLE_DEVICES=0 torchrun --standalone --nproc_per_node=8 --nnodes=1 main.py --cfg_path configs/realsr_swinunet_realesrgan256.yaml --save_dir results_train

And I keep getting the following error:

Traceback (most recent call last): File "main.py", line 48, in trainer.train() File "/workspace/ResShift/ResShift/trainer.py", line 297, in train self.init_logger() # setup logger: self.logger File "/workspace/ResShift/ResShift/trainer.py", line 101, in init_logger assert self.configs.resume AssertionError

I don't want to resume training from any checkpoint. Tell me please, how to get rid of this error.

zsyOAOA commented 6 months ago

Since you have only one GPU, please use python instead of torchrun firstly.