Open fransonf opened 4 years ago
You probably need to have the correct orientation for your 3D model. You should also train your model on cropped objects.
Thanks for the reply!
What do you mean when saying that I need the correct orientation for the 3D model? What will cropping the training data add when training the model?
The object pose (angle) should be consistent between real and 3D objects. For example, if your 3D objects are always frontal and your real objects always face to left, it is hard for CycleGAN to learn the mapping. Cropping will help the model focus on the object of interest, and also potentially increase the resolution of results.
I get the idea, thanks. Is it the --crop_size
flag that I should use for this? My training images are 256x256 - what would be an appropriate crop size?
You need to crop the object before running our program. Our program only does random cropping.
I followed your advice and get slightly better results, but still need improvements.
"The object pose (angle) should be consistent between real and 3D objects" - just to clarify; should all objects in domain A and domain B have the same pose? Or do you mean that all (most) poses should be present in the two domains?
In the first ~50 epochs the pose of real_A and fake_B is consistent but after a while pose can get lost or distorted as shown below.
The loss-plot looks good overall, i.e. the generator/discriminator jumps around in the interval ~ 0.1-0.8. We still have roughly 1500 images in each domain with random poses.
Epoch 45
Epoch 184
Here is the train_opt.txt as well.
----------------- Options ---------------
batch_size: 4 [default: 1]
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: True [default: False]
crop_size: 200 [default: 256]
dataroot: ./datasets/tless1437 [default: None]
dataset_mode: unaligned
direction: AtoB
display_env: main
display_freq: 400
display_id: 1
display_ncols: 4
display_port: 8097
display_server: http://localhost
display_winsize: 256
epoch: latest
epoch_count: 40 [default: 1]
gan_mode: lsgan
gpu_ids: 0
init_gain: 0.02
init_type: normal
input_nc: 3
isTrain: True [default: None]
lambda_A: 10.0
lambda_B: 10.0
lambda_identity: 0.5
load_iter: 0 [default: 0]
load_size: 256 [default: 286]
lr: 0.0002
lr_decay_iters: 50
lr_policy: linear
max_dataset_size: inf
model: cycle_gan
n_epochs: 100
n_epochs_decay: 100
n_layers_D: 2 [default: 3]
name: 1437A_1296B_small [default: experiment_name]
ndf: 64
netD: n_layers [default: basic]
netG: resnet_6blocks [default: resnet_9blocks]
ngf: 64
no_dropout: True
no_flip: False
no_html: False
norm: instance
num_threads: 4
output_nc: 3
phase: train
pool_size: 50
preprocess: resize_and_crop
print_freq: 100
save_by_iter: False
save_epoch_freq: 10
save_latest_freq: 5000
serial_batches: False
suffix:
update_html_freq: 1000
verbose: False
----------------- End -------------------
The distribution of object poses need to be similar across domains A and B. Check out the "Viewpoint estimation" in the VON paper for more details.
We discovered a mistake in our own data augmentation step causing some training images in domain B to become distorted which was the reason for the bad result shown in my latest comment. We fixed it and re-did the run. Around half of the outputs from CycleGAN are amazing while some are less good.
Input: Output:
We want to train a pose estimation network with synthetic training data and want to investigate if the pose-results can be improved on if the training data is first passed through CycleGAN to make it more realistic. For this to succeed we need a more consistent output from CycleGAN.
Since you have more experience on training CycleGAN I would really like to know your thoughts on this. Is it inevitable to get some "bad" outputs from CycleGAN or can the consistency be improved upon with more training data and/or deeper generator/discriminator architecture? We're currently running --netG resnet_6blocks
and --netD n_layers
with --n_layers_D 3
on 256x256 images.
Thanks a lot for your help!
Have you tried the default --netG
and --netD
? The results can be improved using different generator/discriminator architecture, although it is hard to get 100% looking good.
Yes we've tried with default --netG
and --netD
but the problem with distorted/bubbly output is still quite common. We have also tried doubling the amount of training data in domain B and training with different --load_size
and --crop_size
. Any other setting you can recommend?
Is it possible that we're training it for too long or short? We've never trained beyond 200 epochs. In the first epochs there is minimal distortion but the texture is quite bad. It seems like there is a trade-off between texture quality and object distortion the longer you train.
you can try a smaller --crop_size
. Yes, there is a tradeoff between texture and distortion, as the program is not aware of the difference between necessary and unnecessary changes.
Hello. I'm looking at making renderings of industry-relevant objects (from 3D models) look more realistic using CycleGAN. As for now I'm using T-Less http://cmp.felk.cvut.cz/t-less/ since they provide both images of the object in the real distribution and corresponding STL-file which I can use for rendering. Right now I also put black background on the rendered images such that all training images in the two domains have black background. Do you think having a black background on all images ruins the learning of the generator in some way?
The latest trained model I have can generate correct position of the input object but not the correct orientation. Do you have any experience with this? I attach a picture of how it looks. We use roughly 1500 images in each domain during training.
Image of real object:
Rendering of 3D-model (input to CycleGAN):
Output from CycleGAN.