Exception encountered when training cunet model in user mode

shurunxuan commented 3 years ago

Hi, I would like to train a cunet model with my own dataset, however, I always got the exception

/root/torch/install/bin/luajit: lib/ClippedWeightedHuberCriterion.lua:16: bad argument #1 to 'resizeAs' (torch.CudaTensor expected, got userdata)

I can, however, train a vgg_7 model with the same parameters and dataset. I've been stuck with this for days but can't come up with a resolution. Could you help me with this?

Here's the full log of the output.

$ th train.lua -data_dir ./user_data -scale 1 -model cunet -method user -crop_size 100 -batch_size 64 -patches 64 -backend cudnn -save_history 1 -test /root/illustrations/test.png -name my_cunet_model -model_dir ./models/user
{
  grayscale : false
  thread : -1
  name : "my_cunet_model"
  loss : "huber"
  random_erasing_rect_min : 8
  use_transparent_png : false
  model : "cunet"
  random_erasing_rate : 0
  random_pairwise_negate_x_rate : 0
  resume_epoch : 1
  downsampling_filters :
    {
      1 : "Box"
      2 : "Lanczos"
      3 : "Sinc"
    }
  resume : ""
  crop_size : 100
  random_pairwise_rotate_min : -6
  random_blur_size : "3,5"
  random_pairwise_scale_max : 1.176
  random_blur_rate : 0
  random_pairwise_negate_rate : 0
  nr_rate : 0.65
  oracle_drop_rate : 0.5
  inner_epoch : 4
  invert_x : false
  epoch : 50
  update_criterion : "mse"
  pairwise_flip : true
  jpeg_chroma_subsampling_rate : 0.5
  image_list : "./user_data/image_list.txt"
  oracle_rate : 0.1
  active_cropping_tries : 10
  backend : "cudnn"
  random_pairwise_scale_min : 0.85
  active_cropping_rate : 0.5
  batch_size : 64
  random_unsharp_mask_rate : 0
  max_size : 256
  validation_crops : 200
  plot : false
  random_erasing_rect_max : 32
  resize_blur_max : 1.05
  gpu :
    {
      1 : 1
    }
  random_pairwise_scale_rate : 0
  random_pairwise_rotate_rate : 0
  random_color_noise_rate : 0
  validation_filename_split : false
  images : "./user_data/images.t7"
  model_file_best : "./models/user/my_cunet_model.t7"
  model_file : "./models/user/my_cunet_model.%d-%d.t7"
  resize_blur_min : 0.95
  padding_y_zero : false
  test : "/root/illustrations/test.png"
  learning_rate_decay : 3e-07
  method : "user"
  save_history : true
  color : "rgb"
  seed : 11
  pairwise_y_binary : false
  model_dir : "./models/user"
  style : "art"
  data_dir : "./user_data"
  noise_level : 1
  random_half_rate : 0
  validation_rate : 0.05
  random_overlay_rate : 0
  max_training_image_size : -1
  padding : 0
  padding_x_zero : false
  scale : 1
  random_pairwise_rotate_max : 6
  learning_rate : 0.00025
  random_blur_sigma_max : 1
  patches : 64
  random_erasing_n : 1
  random_blur_sigma_min : 0.5
}
# make validation-set
load .. 433=============================== 20/22 ==============================>........]  ETA: 0ms | Step: 0ms
# 1
## resampling
 [======================================== 433/433 ====================================>]  Tot: 17s415ms | Step: 42ms
## update
/root/torch/install/bin/luajit: lib/ClippedWeightedHuberCriterion.lua:16: bad argument #1 to 'resizeAs' (torch.CudaTensor expected, got userdata)
stack traceback:
        [C]: in function 'resizeAs'
        lib/ClippedWeightedHuberCriterion.lua:16: in function 'forward'
        lib/minibatch_adam.lua:46: in function 'opfunc'
        /root/torch/install/share/lua/5.1/optim/adam.lua:37: in function 'adam'
        lib/minibatch_adam.lua:61: in function 'minibatch_adam'
        train.lua:640: in function 'train'
        train.lua:718: in main chunk
        [C]: in function 'dofile'
        /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x561f35bff9d0

nagadomi commented 3 years ago

cunet and upcunet has auxiliary loss output when training. The default -loss huber option (ClippedWeightedHuberCriterion) does not support auxiliary loss format. Use -loss aux_lbp or -loss aux_huber option instead. See https://github.com/nagadomi/waifu2x/blob/master/appendix/train_cunet_art.sh for an example of cunet/upcuent training command.

shurunxuan commented 3 years ago

Yes, that solves the problem. Thanks so much!

I think I need to check the bundled scripts more😫

nagadomi / waifu2x

Exception encountered when training cunet model in user mode #379