Use CycleGAN for pose estimation

akamob commented 2 years ago

Hi, Thank you for the awesome work.

I have a VLP-16 lidar and I want to use it for activity recognition, I want to treat point clouds as RGB images so that I can use openpose or other pose estimation neural network. First, I convert my point clouds into depth images and the following are some examples: hOlnU n02381460_0004

I hope CycleGAN can translate these depth images (with specific action) into corresponding RGB images; namely, if my depth images consists of walk, kick, writing, climbing, etc., I hope CycleGAN can not only achieve image-to-image translation but also translate to correct action. Is it possible?

The link are my current results and the following video is input:

https://user-images.githubusercontent.com/49118957/156760287-cd8af0cf-06e7-41e7-9a13-477c365d3c09.mp4

Now my depth images only contain front view of walking and there is only one person in the scene. After 30 epochs training (I only have Colab for training), CycleGAN outputs the above video. I use this command for training: python train.py --dataroot ./datasets/yivlp2yihuman --name yivlp2rgbhuman --model cycle_gan --n_epochs 15 --n_epochs_decay 15

I have some questions to ask:

As mentioned above, if it is possible that CycleGAN can translate to the corresponding actions?
I have processed my depth images with 256x256 pixels, do I need to use crop and resize? I know this issue and tips.md, but in my case I'm not sure if these factors will be beneficial for training. And if not use cropping or resizing, just setup --preprocess none in command line?
How much data should be enough? I noticed that different datasets have data volumes ranging from 400 to 137K.

May i have your suggestions? Any help is much appreciated:)

taesungp commented 2 years ago

Hello,

it sounds like a reasonable task. But do you have paired data in your dataset? If you have the ground-truth RGB for the input images, you can use pix2pix instead of CycleGAN.
Yes. --preprocess none will disable cropping. Cropping can be beneficial when you don't have enough samples in your dataset, to prevent overfitting.
This depends on your task and the complexity of the data, so I can't say for sure, but you probably need more than a few thousand images, or hopefully 50k+ images.

If the goal is to pass the output images to pose estimation networks, perhaps you can increase the weight on the L1 loss. It will generate less diverse images but will be more faithful to the input data and with less artifacts.

akamob commented 2 years ago

Hi, @taesungp, thank you very much for your reply:)

Yes, I have paired data. But my depth images are distorted, they may not have the accurate correspondence to the same locations with RGB images. This is why I use CycleGAN. Depend on my task. Is pix2pix better than CycleGAN?

In pix2pix_model.py, I found: parser.add_argument('--lambda_L1', type=float, default=100.0, help='weight for L1 loss')

And CycleGAN (cycle_gan_model.py) has:

lambda_A, lambda_B, default=10.0
lambda_identity, default=0.5

But I don't know how much to increase these weight, this issue increase lambda weight from 10 to 20. Is there any rule to follow? In other words, I'm wondering the reasonable range of loss.

In addition, I can only train for 100 epochs at a time on colab, then I use --continue_train for the next cycle. Compared to training in one go, I'm not sure if my training configuration properly (learning rate decay problem).

May i have your suggestions? Any help is much appreciated!

akamob commented 2 years ago

Hi, @taesungp, the following are my recently experiment results: My datasets have 9411 depth images, 9081 RGB images, all images are 256x256, and contain three actions: forward/backward, wave hands, and forward bend.

I use this command to train CycleGAN on Colab: !python train.py --dataroot ./datasets/yicyclepix_0322 --name yivlp2rgbhuman --model cycle_gan --n_epochs 100 --n_epochs_decay 100 --epoch_count 180 --continue_train --lambda_A 25 --lambda_B 25 --batch_size 3 --preprocess crop --load_size 256 --crop_size 224 --display_id -1

Training options:

----------------- Options ---------------
               batch_size: 3                                [default: 1]
                    beta1: 0.5                           
          checkpoints_dir: ./checkpoints                 
           continue_train: True                             [default: False]
                crop_size: 224                              [default: 256]
                 dataroot: ./datasets/yicyclepix_0322       [default: None]
             dataset_mode: unaligned                     
                direction: AtoB                          
              display_env: main                          
             display_freq: 400                           
               display_id: -1                               [default: 1]
            display_ncols: 4                             
             display_port: 8097                          
           display_server:[ http://localhost](http://localhost/)              
          display_winsize: 256                           
                    epoch: latest                        
              epoch_count: 180                              [default: 1]
                 gan_mode: lsgan                         
                  gpu_ids: 0                             
                init_gain: 0.02                          
                init_type: normal                        
                 input_nc: 3                             
                  isTrain: True                             [default: None]
                 lambda_A: 25.0                             [default: 10.0]
                 lambda_B: 25.0                             [default: 10.0]
          lambda_identity: 0.5                           
                load_iter: 0                                [default: 0]
                load_size: 256                              [default: 286]
                       lr: 0.0002                        
           lr_decay_iters: 50                            
                lr_policy: linear                        
         max_dataset_size: inf                           
                    model: cycle_gan                     
                 n_epochs: 100                           
           n_epochs_decay: 100                           
               n_layers_D: 3                             
                     name: yivlp2rgbhuman                   [default: experiment_name]
                      ndf: 64                            
                     netD: basic                         
                     netG: resnet_9blocks                
                      ngf: 64                            
               no_dropout: True                          
                  no_flip: False                         
                  no_html: False                         
                     norm: instance                      
              num_threads: 4                             
                output_nc: 3                             
                    phase: train                         
                pool_size: 50                            
               preprocess: crop                             [default: resize_and_crop]
               print_freq: 100                           
             save_by_iter: False                         
          save_epoch_freq: 5                             
         save_latest_freq: 5000                          
           serial_batches: False                         
                   suffix:                               
         update_html_freq: 1000                          
                use_wandb: False                         
                  verbose: False                         
----------------- End -------------------

After 200 epochs of training, I plot losses: CycleGAN_Loss And these videos are test results: For forward/backward: Input, Output. For wave hands: Input, Output. For forward bend: Input, Output.

I add these flags when test my CycleGAN model: --batch_size 3 --preprocess crop --load_size 256 --crop_size 224 --no_dropout As these videos show, the result is not good. Could you give me some suggestions?

Besides, I try another training options: I change --load_size, --crop_size 256, and add --netG: !python train.py --dataroot ./datasets/yicyclepix_0322 --name yivlp2rgbhuman --model cycle_gan --n_epochs 100 --n_epochs_decay 100 --epoch_count 89 --continue_train --lambda_A 25 --lambda_B 25 --batch_size 3 --netG resnet_6blocks --preprocess crop --load_size 286 --crop_size 256 --display_id -1

Training options:

----------------- Options ---------------
               batch_size: 3                                [default: 1]
                    beta1: 0.5                           
          checkpoints_dir: ./checkpoints                 
           continue_train: True                             [default: False]
                crop_size: 256                           
                 dataroot: ./datasets/yicyclepix_0322       [default: None]
             dataset_mode: unaligned                     
                direction: AtoB                          
              display_env: main                          
             display_freq: 400                           
               display_id: -1                               [default: 1]
            display_ncols: 4                             
             display_port: 8097                          
           display_server:[ http://localhost](http://localhost/)              
          display_winsize: 256                           
                    epoch: latest                        
              epoch_count: 89                               [default: 1]
                 gan_mode: lsgan                         
                  gpu_ids: 0                             
                init_gain: 0.02                          
                init_type: normal                        
                 input_nc: 3                             
                  isTrain: True                             [default: None]
                 lambda_A: 25.0                             [default: 10.0]
                 lambda_B: 25.0                             [default: 10.0]
          lambda_identity: 0.5                           
                load_iter: 0                                [default: 0]
                load_size: 286                           
                       lr: 0.0002                        
           lr_decay_iters: 50                            
                lr_policy: linear                        
         max_dataset_size: inf                           
                    model: cycle_gan                     
                 n_epochs: 100                           
           n_epochs_decay: 100                           
               n_layers_D: 3                             
                     name: yivlp2rgbhuman                   [default: experiment_name]
                      ndf: 64                            
                     netD: basic                         
                     netG: resnet_6blocks                   [default: resnet_9blocks]
                      ngf: 64                            
               no_dropout: True                          
                  no_flip: False                         
                  no_html: False                         
                     norm: instance                      
              num_threads: 4                             
                output_nc: 3                             
                    phase: train                         
                pool_size: 50                            
               preprocess: crop                             [default: resize_and_crop]
               print_freq: 100                           
             save_by_iter: False                         
          save_epoch_freq: 5                             
         save_latest_freq: 5000                          
           serial_batches: False                         
                   suffix:                               
         update_html_freq: 1000                          
                use_wandb: False                         
                  verbose: False                         
----------------- End -------------------

Although now I just train CycleGAN to 133 epochs, images generated during the training process let me know that the learning effect may still be bad: training_produce

May i have your suggestions? Any help is much appreciated:)

junyanz / pytorch-CycleGAN-and-pix2pix

Use CycleGAN for pose estimation #1388