zsyOAOA / ResShift

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight, TPAMI@2024)
Other
955 stars 50 forks source link

同样是训练时候的进度条问题,tensorboard已经设置为True为什么还不显示呢 #103

Closed 4C4247 closed 2 weeks ago

4C4247 commented 2 weeks ago

Loading model from: G:\anaconda\envs\resshift\lib\site-packages\lpips\weights\v0.1\vgg.pth Number of images in train data set: 9032 Number of images in val data set: 32 命令行显示这个是已经开始训练了吗?

4C4247 commented 2 weeks ago

???? ????2 ????3 已经把保存日志设置的很频繁了 但还是没有保存,还有就是保存模型频率的参数我没有找到,求解~

zsyOAOA commented 2 weeks ago

Loading model from: G:\anaconda\envs\resshift\lib\site-packages\lpips\weights\v0.1\vgg.pth Number of images in train data set: 9032 Number of images in val data set: 32 命令行显示这个是已经开始训练了吗?

是的,已经开始训练了

zsyOAOA commented 2 weeks ago

???? ????2 ????3 已经把保存日志设置的很频繁了 但还是没有保存,还有就是保存模型频率的参数我没有找到,求解~

save_freq控制模型保存频率,你可以先把batchsize和save_freq设置的更小一些,确认代码运行正常。

4C4247 commented 2 weeks ago

好的好的!谢谢你·~我再试试看QAQ

4C4247 commented 2 weeks ago

Number of images in train data set: 9032 Number of images in val data set: 32 Elapsed time: 163.40s

Validation Metric: PSNR=14.23, LPIPS=0.8931...

Train: 000100/113200, Loss/MSE: t(1):4.0e-01/4.0e-01, t(8):6.5e-01/6.5e-01, t(15):1.1e+00/1.1e+00, lr:5.00e-05 Traceback (most recent call last): File "G:\Pythonprojects\ResShift-journal\main.py", line 48, in trainer.train() File "G:\Pythonprojects\ResShift-journal\trainer.py", line 317, in train self.training_step(data) File "G:\Pythonprojects\ResShift-journal\trainer.py", line 773, in training_step self.log_step_train(losses, tt, micro_data, z_t, z0_pred.detach()) File "G:\Pythonprojects\ResShift-journal\trainer.py", line 835, in log_step_train self.logging_metric(self.loss_mean, tag='Loss', phase=phase, add_global_step=True) File "G:\Pythonprojects\ResShift-journal\trainer.py", line 413, in logging_metric self.writer.add_scalars(tag, metrics, self.log_step[phase]) File "G:\anaconda\envs\resshift\lib\site-packages\torch\utils\tensorboard\writer.py", line 435, in add_scalars fw.add_summary(scalar(main_tag, scalar_value), global_step, walltime) File "G:\anaconda\envs\resshift\lib\site-packages\torch\utils\tensorboard\summary.py", line 335, in scalar tensor.ndim == 0 AssertionError: Tensor should contain one element (0 dimensions). Was given size: 3 and 1 dimensions.

我修改了参数为batch[32,4] iterations:113200 warmup_iterations:30 save_freq:100 log_freq:[100,1000,100] 然后练了一小会儿出现了以上报错,是因为我的输入图片是黑白图片(拓片图像)的原因吗?

4C4247 commented 2 weeks ago

不太懂怎么改 我先尝试着把in_channel改为1试了一下出现了以下错误 RuntimeError: Given groups=1, weight of size [160, 17, 3, 3], expected input[4, 6, 64, 64] to have 17 channels, but got 6 channels instead

4C4247 commented 2 weeks ago

不太懂怎么改 我先尝试着把in_channel改为1试了一下出现了以下错误 RuntimeError: Given groups=1, weight of size [160, 17, 3, 3], expected input[4, 6, 64, 64] to have 17 channels, but got 6 channels instead

不太懂怎么改 我先尝试着把in_channel改为1试了一下出现了以下错误 RuntimeError: Given groups=1, weight of size [160, 17, 3, 3], expected input[4, 6, 64, 64] to have 17 channels, but got 6 channels instead

问题解决了: