tpu: tpu-v3-256-euw4a-54; run: shawn-bigrun61-chaos256; description: BigGAN 256x256 on many datasets; logdir: gs://darnbooru-euw4a/runs/bigrun61/

shawwn commented 4 years ago

Run script + branch: https://github.com/shawwn/compare_gan/blob/2020-05-09/dynamicvars/run_bigrun61.sh

dataset.name = "danbooru_256"
options.datasets = "gs://darnbooru-euw4a/datasets/danbooru2019-s/danbooru2019-s-0*,gs://darnbooru-euw4a/datasets/danbooru2019-s/danbooru2019-s-0*,gs://darnbooru-euw4a/datasets/imagenet/train-0*,gs://darnbooru-euw4a/datasets/flickr3m/flickr3m-0*,gs://darnbooru-euw4a/datasets/ffhq1024/ffhq1024-0*,gs://darnbooru-euw4a/datasets/portraits/portraits-0*,gs://darnbooru-euw4a/datasets/ffhq1024/ffhq1024-0*,gs://darnbooru-euw4a/datasets/portraits/portraits-0*"
#options.transpose_input = True # for performance
options.random_labels = True
options.num_classes = 1000
train_imagenet_transform.crop_method = "random"
options.z_dim = 140
resnet_biggan.Generator.ch = 128
resnet_biggan.Discriminator.ch = 128
resnet_biggan.Generator.blocks_with_attention = "128"
resnet_biggan.Discriminator.blocks_with_attention = "128"

options.architecture = "resnet_biggan_arch"
ModularGAN.conditional = True
options.batch_size = 2048
options.gan_class = @ModularGAN
options.lamba = 1
options.training_steps = 250000
weights.initializer = "orthogonal"
spectral_norm.singular_value = "auto"

# Generator
G.batch_norm_fn = @conditional_batch_norm
G.spectral_norm = True
ModularGAN.g_use_ema = True
resnet_biggan.Generator.hierarchical_z = True
resnet_biggan.Generator.embed_y = True
standardize_batch.decay = 0.9
standardize_batch.epsilon = 1e-5
standardize_batch.use_moving_averages = False
standardize_batch.use_cross_replica_mean = None

# Discriminator
options.disc_iters = 2
D.spectral_norm = True
resnet_biggan.Discriminator.project_y = True

# Loss and optimizer
loss.fn = @hinge
penalty.fn = @no_penalty
ModularGAN.g_lr = 0.0000666
ModularGAN.d_lr = 0.0005
ModularGAN.g_lr_mul = 1.0
ModularGAN.d_lr_mul = 1.0
ModularGAN.g_optimizer_fn = @tf.train.AdamOptimizer
ModularGAN.d_optimizer_fn = @tf.train.AdamOptimizer
tf.train.AdamOptimizer.beta1 = 0.0
tf.train.AdamOptimizer.beta2 = 0.999

z.distribution_fn = @tf.random.normal
eval_z.distribution_fn = @tf.random.normal

run_config.experimental_host_call_every_n_steps = 50
TpuSummaries.save_image_steps = 50
run_config.iterations_per_loop = 500
run_config.save_checkpoints_steps = 250

options.d_flood = -128.0
options.g_flood = -128.0
options.d_stop_g_above = 128.0
options.g_stop_d_above = 128.0
options.d_stop_d_below = -128.0
options.g_stop_g_below = -128.0

# Try out new options
ModularGAN.experimental_joint_gen_for_disc = True
ModularGAN.experimental_force_graph_unroll = True

options.d_stop_d_below = 0.20
#options.g_stop_g_below = 0.05
#options.d_stop_g_above = 1.00
options.g_stop_d_above = 1.50

shawwn commented 4 years ago

Left: probability the discriminator thinks real images are real. Right: probability the discriminator thinks fake images are real.

Same info as above, but losses rather than probabilities.

Losses for D and G:

What the losses for D and G would have been without stop loss.

Note that logging only occurs every 50 iterations, so this graph is deceptive; even still, you can clearly see a big effect early in training on G's graph (G stops if D goes above 1.5), and a decent effect on D's graph (D stops if D drops below 0.20).

gwern commented 4 years ago

ModularGAN.g_lr_mul = 0.25
ModularGAN.d_lr_mul = 0.25

tensorfork / tensorfork

tpu: tpu-v3-256-euw4a-54; run: shawn-bigrun61-chaos256; description: BigGAN 256x256 on many datasets; logdir: gs://darnbooru-euw4a/runs/bigrun61/ #27