junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
23.05k stars 6.31k forks source link

Error: step() missing 1 required positional argument: 'metrics' when using plateau lr_policy #495

Closed SorourMo closed 5 years ago

SorourMo commented 5 years ago

Hi, I get the following error during training:

Traceback (most recent call last): File "train.py", line 63, in model.update_learning_rate() File "~\pytorch-CycleGAN-and-pix2pix-master\models\base_model.py", line 69, in update_learning_rate scheduler.step()

Here are the training options and the output of the code: torch version =0.4.1 Python version = 3.6

**Options** batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False dataroot: ./datasets/data [default: None] dataset_mode: unaligned display_env: main display_freq: 400 display_id: 1 display_ncols: 4 display_port: 8097 display_server: http://localhost display_winsize: 256 epoch_count: 1 fineSize: 192 [default: 256] gpu_ids: 0 init_gain: 0.02 init_type: normal input_nc: 3 isTrain: True [default: None] lambda_A: 10.0 lambda_B: 10.0 lambda_identity: 0.5 loadSize: 192 [default: 286] lr: 0.0002 lr_decay_iters: 50 lr_policy: plateau [default: lambda] max_dataset_size: inf model: cycle_gan nThreads: 4 n_layers_D: 3 name: cyclegan [default: experiment_name] ndf: 64 ngf: 64 niter: 100 niter_decay: 100 no_dropout: True no_flip: False no_html: False no_lsgan: False norm: instance output_nc: 3 phase: train pool_size: 50 print_freq: 100 resize_or_crop: resize_and_crop save_epoch_freq: 5 save_latest_freq: 5000 serial_batches: False suffix: update_html_freq: 1000 verbose: False which_direction: AtoB which_epoch: latest which_model_netD: basic which_model_netG: resnet_9blocks *****End**** dataset [UnalignedDataset] was created

training images = 1000

initialize network with normal initialize network with normal initialize network with normal initialize network with normal model [CycleGANModel] was created ****Networks initialized**** [Network G_A] Total number of parameters : 11.378 M [Network G_B] Total number of parameters : 11.378 M [Network D_A] Total number of parameters : 2.765 M [Network D_B] Total number of parameters : 2.765 M


create web directory ./checkpoints\cyclegan\web... (epoch: 1, iters: 100, time: 0.310, data: 2.437) D_A: 1.223 G_A: 1.629 cycle_A: 1.119 idt_A: 1.228 D_B: 0.360 G_B: 0.525 cycle_B: 2.160 idt_B: 0.578 (epoch: 1, iters: 200, time: 0.297, data: 0.000) D_A: 0.303 G_A: 0.420 cycle_A: 1.724 idt_A: 1.402 D_B: 0.223 G_B: 0.287 cycle_B: 2.647 idt_B: 0.853 (epoch: 1, iters: 300, time: 0.312, data: 0.000) D_A: 0.333 G_A: 0.411 cycle_A: 1.656 idt_A: 0.907 D_B: 0.218 G_B: 0.325 cycle_B: 1.876 idt_B: 0.758 (epoch: 1, iters: 400, time: 0.563, data: 0.000) D_A: 0.301 G_A: 0.248 cycle_A: 3.143 idt_A: 0.695 D_B: 0.277 G_B: 0.437 cycle_B: 1.703 idt_B: 1.545 (epoch: 1, iters: 500, time: 0.344, data: 0.000) D_A: 0.280 G_A: 0.405 cycle_A: 1.780 idt_A: 1.205 D_B: 0.349 G_B: 0.490 cycle_B: 2.279 idt_B: 1.014 (epoch: 1, iters: 600, time: 0.307, data: 0.000) D_A: 0.176 G_A: 0.324 cycle_A: 1.865 idt_A: 1.127 D_B: 0.307 G_B: 0.292 cycle_B: 2.384 idt_B: 0.930 (epoch: 1, iters: 700, time: 0.328, data: 0.000) D_A: 0.197 G_A: 0.570 cycle_A: 1.368 idt_A: 0.695 D_B: 0.349 G_B: 0.628 cycle_B: 1.661 idt_B: 0.700 (epoch: 1, iters: 800, time: 0.590, data: 0.000) D_A: 0.402 G_A: 0.776 cycle_A: 1.747 idt_A: 0.559 D_B: 0.179 G_B: 0.274 cycle_B: 1.279 idt_B: 1.027 (epoch: 1, iters: 900, time: 0.376, data: 0.000) D_A: 0.326 G_A: 0.336 cycle_A: 1.710 idt_A: 1.523 D_B: 0.262 G_B: 0.386 cycle_B: 3.144 idt_B: 0.849 (epoch: 1, iters: 1000, time: 0.312, data: 0.000) D_A: 0.216 G_A: 0.481 cycle_A: 2.646 idt_A: 0.730 D_B: 0.413 G_B: 0.419 cycle_B: 1.580 idt_B: 1.530 End of epoch 1 / 200 Time Taken: 322 sec Traceback (most recent call last): File "train.py", line 63, in model.update_learning_rate() File "~\models\base_model.py", line 69, in update_learning_rate scheduler.step() TypeError: step() missing 1 required positional argument: 'metrics'

Do I need to change the "update_learning_rate" function for this specific learning policy in the "models\base_model.py"? I haven't changed the default params of ReduceLROnPlateau: lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.2, threshold=0.01, patience=5)

Any help would be appreciated.

junyanz commented 5 years ago

I just added self.metric to the code. It was initialized as 0 here. and used in step(). You need to assign your loss value to self.metric.

SorourMo commented 5 years ago

Thanks. It worked by passing val loss in self.metric.