junyanz / pytorch-CycleGAN-and-pix2pix

Image-to-Image Translation in PyTorch
Other
22.73k stars 6.28k forks source link

Synthetic to real distribution on industry objects #943

Open fransonf opened 4 years ago

fransonf commented 4 years ago

Hello. I'm looking at making renderings of industry-relevant objects (from 3D models) look more realistic using CycleGAN. As for now I'm using T-Less http://cmp.felk.cvut.cz/t-less/ since they provide both images of the object in the real distribution and corresponding STL-file which I can use for rendering. Right now I also put black background on the rendered images such that all training images in the two domains have black background. Do you think having a black background on all images ruins the learning of the generator in some way?

The latest trained model I have can generate correct position of the input object but not the correct orientation. Do you have any experience with this? I attach a picture of how it looks. We use roughly 1500 images in each domain during training.

Image of real object: 0336

Rendering of 3D-model (input to CycleGAN): train361_real

Output from CycleGAN. train361_fake

junyanz commented 4 years ago

You probably need to have the correct orientation for your 3D model. You should also train your model on cropped objects.

fransonf commented 4 years ago

Thanks for the reply!

What do you mean when saying that I need the correct orientation for the 3D model? What will cropping the training data add when training the model?

junyanz commented 4 years ago

The object pose (angle) should be consistent between real and 3D objects. For example, if your 3D objects are always frontal and your real objects always face to left, it is hard for CycleGAN to learn the mapping. Cropping will help the model focus on the object of interest, and also potentially increase the resolution of results.

fransonf commented 4 years ago

I get the idea, thanks. Is it the --crop_size flag that I should use for this? My training images are 256x256 - what would be an appropriate crop size?

junyanz commented 4 years ago

You need to crop the object before running our program. Our program only does random cropping.

fransonf commented 4 years ago

I followed your advice and get slightly better results, but still need improvements.

  1. "The object pose (angle) should be consistent between real and 3D objects" - just to clarify; should all objects in domain A and domain B have the same pose? Or do you mean that all (most) poses should be present in the two domains?

  2. In the first ~50 epochs the pose of real_A and fake_B is consistent but after a while pose can get lost or distorted as shown below.

The loss-plot looks good overall, i.e. the generator/discriminator jumps around in the interval ~ 0.1-0.8. We still have roughly 1500 images in each domain with random poses.

Epoch 45 epoch045_real_A epoch045_fake_B

Epoch 184 epoch184_real_A image

Here is the train_opt.txt as well. ----------------- Options --------------- batch_size: 4 [default: 1] beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: True [default: False] crop_size: 200 [default: 256] dataroot: ./datasets/tless1437 [default: None] dataset_mode: unaligned direction: AtoB display_env: main display_freq: 400 display_id: 1 display_ncols: 4 display_port: 8097 display_server: http://localhost display_winsize: 256 epoch: latest epoch_count: 40 [default: 1] gan_mode: lsgan gpu_ids: 0 init_gain: 0.02 init_type: normal input_nc: 3 isTrain: True [default: None] lambda_A: 10.0 lambda_B: 10.0 lambda_identity: 0.5 load_iter: 0 [default: 0] load_size: 256 [default: 286] lr: 0.0002 lr_decay_iters: 50 lr_policy: linear max_dataset_size: inf
model: cycle_gan
n_epochs: 100 n_epochs_decay: 100
n_layers_D: 2 [default: 3] name: 1437A_1296B_small [default: experiment_name] ndf: 64 netD: n_layers [default: basic] netG: resnet_6blocks [default: resnet_9blocks] ngf: 64 no_dropout: True
no_flip: False no_html: False norm: instance num_threads: 4 output_nc: 3 phase: train pool_size: 50 preprocess: resize_and_crop
print_freq: 100 save_by_iter: False save_epoch_freq: 10
save_latest_freq: 5000
serial_batches: False
suffix: update_html_freq: 1000
verbose: False ----------------- End -------------------

junyanz commented 4 years ago

The distribution of object poses need to be similar across domains A and B. Check out the "Viewpoint estimation" in the VON paper for more details.

fransonf commented 4 years ago

We discovered a mistake in our own data augmentation step causing some training images in domain B to become distorted which was the reason for the bad result shown in my latest comment. We fixed it and re-did the run. Around half of the outputs from CycleGAN are amazing while some are less good.

Input: train1041_real train1030_real Output: train1041_fake train1030_fake

We want to train a pose estimation network with synthetic training data and want to investigate if the pose-results can be improved on if the training data is first passed through CycleGAN to make it more realistic. For this to succeed we need a more consistent output from CycleGAN.

Since you have more experience on training CycleGAN I would really like to know your thoughts on this. Is it inevitable to get some "bad" outputs from CycleGAN or can the consistency be improved upon with more training data and/or deeper generator/discriminator architecture? We're currently running --netG resnet_6blocks and --netD n_layers with --n_layers_D 3 on 256x256 images.

Thanks a lot for your help!

junyanz commented 4 years ago

Have you tried the default --netG and --netD? The results can be improved using different generator/discriminator architecture, although it is hard to get 100% looking good.

fransonf commented 4 years ago

Yes we've tried with default --netG and --netD but the problem with distorted/bubbly output is still quite common. We have also tried doubling the amount of training data in domain B and training with different --load_size and --crop_size. Any other setting you can recommend?

Is it possible that we're training it for too long or short? We've never trained beyond 200 epochs. In the first epochs there is minimal distortion but the texture is quite bad. It seems like there is a trade-off between texture quality and object distortion the longer you train.

junyanz commented 4 years ago

you can try a smaller --crop_size. Yes, there is a tradeoff between texture and distortion, as the program is not aware of the difference between necessary and unnecessary changes.