junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
22.8k stars 6.29k forks source link

pix2pix training doesn't go past "create web directory ./checkpoints\cartoonify_pix2pix\web..." #1594

Closed TehreemFarooqi closed 1 year ago

TehreemFarooqi commented 1 year ago

I am training a pix2pix model on my own dataset. I trained for 170 epochs at first and then another 20 epochs. Now when I try to continue training, it loads the model and does everything normally but stops execution without giving any error. This issue was there before, but doing random things like restarting the PC or Jupyter, and changing the value of n_epochs and n_epochs_decay solved the issue. Now it's not training and always stops after the step mentioned above.

Below is the output and my training command:

!python train.py --dataroot ../processed2 --continue_train --n_epochs 70 --n_epochs_decay 70 \
                --name cartoonify_pix2pix --model pix2pix --dataset_mode aligned \
                --batch_size 8 --load_size 512 --crop_size 256 --preprocess scale_width_and_crop \
                --display_id 0 --netG resnet_9blocks --epoch_count 201
----------------- Options ---------------
               batch_size: 8                                [default: 1]
                    beta1: 0.5                           
          checkpoints_dir: ./checkpoints                 
           continue_train: True                             [default: False]
                crop_size: 256                           
                 dataroot: ../processed2                    [default: None]
             dataset_mode: aligned                       
                direction: AtoB                          
              display_env: main                          
             display_freq: 400                           
               display_id: 0                                [default: 1]
            display_ncols: 4                             
             display_port: 8097                          
           display_server: http://localhost/              
          display_winsize: 256                           
                    epoch: latest                        
              epoch_count: 201                              [default: 1]
                 gan_mode: vanilla                       
                  gpu_ids: 0                             
                init_gain: 0.02                          
                init_type: normal                        
                 input_nc: 3                             
                  isTrain: True                             [default: None]
                lambda_L1: 100.0                         
                load_iter: 0                                [default: 0]
                load_size: 512                              [default: 286]
                       lr: 0.0002                        
           lr_decay_iters: 50                            
                lr_policy: linear                        
         max_dataset_size: inf                           
                    model: pix2pix                          [default: cycle_gan]
                 n_epochs: 50                               [default: 100]
           n_epochs_decay: 50                               [default: 100]
               n_layers_D: 3                             
                     name: cartoonify_pix2pix               [default: experiment_name]
                      ndf: 64                            
                     netD: basic                         
                     netG: resnet_9blocks                   [default: unet_256]
                      ngf: 64                            
               no_dropout: False                         
                  no_flip: False                         
                  no_html: False                         
                     norm: batch                         
              num_threads: 4                             
                output_nc: 3                             
                    phase: train                         
                pool_size: 0                             
               preprocess: scale_width_and_crop             [default: resize_and_crop]
               print_freq: 100                           
             save_by_iter: False                         
          save_epoch_freq: 5                             
         save_latest_freq: 5000                          
           serial_batches: False                         
                   suffix:                               
         update_html_freq: 1000                          
                use_wandb: False                         
                  verbose: False                         
       wandb_project_name: CycleGAN-and-pix2pix          
----------------- End -------------------
dataset [AlignedDataset] was created
The number of training images = 904
initialize network with normal
initialize network with normal
model [Pix2PixModel] was created
loading the model from ./checkpoints\cartoonify_pix2pix\latest_net_G.pth
loading the model from ./checkpoints\cartoonify_pix2pix\latest_net_D.pth
---------- Networks initialized -------------
[Network G] Total number of parameters : 11.383 M
[Network D] Total number of parameters : 2.769 M
-----------------------------------------------
create web directory ./checkpoints\cartoonify_pix2pix\web...

@taesungp any solution please?

VENNRICO commented 1 year ago

i have the same problem, could you please tell me how you solve it?

we89 commented 3 months ago

if you want to contunue training, --n_epochs , --epoch_count 201 and --n_epochs_decay must be bigger than previous value, so the training can continue.