wenchen4321 / GDSR

the github repository of GDSR
3 stars 3 forks source link

Given groups=1, weight of size 32 64 3 3, expected input[7, 3, 32, 32] to have 64 channels, but got 3 channels instead #1

Open dk40913 opened 3 years ago

dk40913 commented 3 years ago

Hi there, I am currently trying to reproduce your project and have faced some issues that I cannot resolve, I was wondering if you could help me?

I am currently running pretrain for DualSR using python train.py -opt options/train/DualSR_pretrain.json , the JSON configs are as follows:

  "name": "12_21_01_Dual_RRDB_128_GAN_x4_DIV2K",
  "use_tb_logger": true,
  "model": "DualSR_pretrain",
  "scale": 4,
  "gpu_ids": [0, 1, 2, 3],
  "datasets": {
    "train": {
      "name": "DIV2K",
      "mode": "LRHR",
      "dataroot_HR": "/media/e621/RAID5/e621/downloads/.tmp/DIV2K_train_HR_sub.lmdb",
      "dataroot_LR": "/media/e621/RAID5/e621/downloads/.tmp/DIV2K_train_LR_x4_sub.lmdb",
      "subset_file": null,
      "use_shuffle": true,
      "n_workers": 8,
      "batch_size": 28
      "HR_size": 128
      "use_flip": true,
      "use_rot": true
    "val": {
      "name": "val_set5_part",
      "mode": "LRHR",
      "dataroot_HR": "/home/e517herb/Desktop/GDSR/Set5/original",
      "dataroot_LR": "/home/e517herb/Desktop/GDSR/Set5/LRbicx4"
  "path": {
    "root": "/home/e517herb/Desktop/LCC_BasicSR-master"
  "network_G": {
    "which_model_G": "DualSR_RRDB"
    "norm_type": null,
    "mode": "CNA",
    "nf": 64,
    "nb_l_1": 10,
    "nb_l_2": 5,
    "nb_h_1": 5,
    "nb_h_2": 10,
    "nb_e": 2,
    "nb_m": 5,
    "in_nc": 3,
    "out_nc": 3,
    "gc": 32,
    "group": 1
    "network_D": {
    "which_model_D": "discriminator_vgg_128"
    , "norm_type": "batch"
    , "act_type": "leakyrelu"
    , "mode": "CNA"
    , "nf": 64
    , "in_nc": 3

  "train": {
    "lr_G": 5e-5,
    "lr_scheme": "MultiStepLR",
    "lr_steps": [
    "lr_gamma": 0.5,
    "pixel_criterion": "l1",
    "pixel_weight": 1.0,
    "val_freq": 1e3,
    "manual_seed": 0,
    "niter": 1e6,
    "feature_weight": 1
  "logger": {
    "print_freq": 200,
    "save_checkpoint_freq": 1e3

the output shows a mismatch between input channel size

20-11-10 00:57:08.525 - INFO: Model [DualSR_pretrain] is created.
20-11-10 00:57:08.525 - INFO: Start training from epoch: 0, iter: 0
Traceback (most recent call last):
  File "train.py", line 182, in <module>
  File "train.py", line 112, in main
  File "/home/e517herb/Desktop/GDSR/codes/models/DualSR_pretrain.py", line 68, in optimize_parameters
    self.fake_Low, self.fake_high, self.fake_mask = self.netG(self.var_L)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/e517herb/Desktop/GDSR/codes/models/modules/DualSR_RRDB.py", line 89, in forward
    x = self.body_ex(x) # shared module做2次RRDB後
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/e517herb/Desktop/GDSR/codes/models/modules/block.py", line 94, in forward
    output = x + self.sub(x)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/e517herb/Desktop/GDSR/codes/models/modules/block.py", line 240, in forward
    out = self.RDB1(x)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/e517herb/Desktop/GDSR/codes/models/modules/block.py", line 215, in forward
    x1 = self.conv1(x)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "/home/e517herb/anaconda3/envs/GDSR/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 32 64 3 3, expected input[7, 3, 32, 32] to have 64 channels, but got 3 channels instead

Is there something I did wrong? Thanks in advance!

wenchen4321 commented 3 years ago

Hello, I checked the code and found a bug in the code file. I am sorry for that and fix it. I think you can run the code successfully now.

dk40913 commented 3 years ago

@wenchen4321 thank you for your quick reply and bug fix, as I did make it to training, but the furthest I can get is epoch:0 iter:1000, here is the output:
Interesting enough, tensorboard's output:
Thanks in advance!