skyflynil / stylegan2

StyleGAN2 - Official TensorFlow Implementation with practical improvements
http://arxiv.org/abs/1912.04958
Other
120 stars 33 forks source link

Lazy path regularization for G fails #5

Closed aiXander closed 4 years ago

aiXander commented 4 years ago

Hey absolutely love the functionalities of this repo!! Thx so much!

However, I'm getting issues when running this on my local RTX 2080 Ti machine.. All the training ops run fine, except for the _G_regop

The weird thing is that I just tested with the code from the official StyleGAN2 repo, and there everything seems to work perfectly (including G_reg_op)...

Screenshot from 2020-02-07 00-57-42

To summarize, the core of the error seems to originate in the custom cuda functions for upsampling: Screenshot from 2020-02-07 01-00-37

Here's the full traceback:

``dnnlib: Running training.training_loop.training_loop() on localhost... Streaming data using training.dataset.TFRecordDataset... #################################################### TFREC dir: datasets/fractal_tf_rec ['fractal_tf_rec-r07.tfrecords'] #################################################### Dataset shape = [3, 1024, 1024] Dynamic range = [0, 255] Label size = 0 Constructing networks... Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.

G                               Params    OutputShape          WeightShape     
---                             ---       ---                  ---             
latents_in                      -         (?, 512)             -               
labels_in                       -         (?, 0)               -               
lod                             -         ()                   -               
dlatent_avg                     -         (512,)               -               
G_mapping/latents_in            -         (?, 512)             -               
G_mapping/labels_in             -         (?, 0)               -               
G_mapping/Normalize             -         (?, 512)             -               
G_mapping/Dense0                262656    (?, 512)             (512, 512)      
G_mapping/Dense1                262656    (?, 512)             (512, 512)      
G_mapping/Dense2                262656    (?, 512)             (512, 512)      
G_mapping/Dense3                262656    (?, 512)             (512, 512)      
G_mapping/Dense4                262656    (?, 512)             (512, 512)      
G_mapping/Dense5                262656    (?, 512)             (512, 512)      
G_mapping/Dense6                262656    (?, 512)             (512, 512)      
G_mapping/Dense7                262656    (?, 512)             (512, 512)      
G_mapping/Broadcast             -         (?, 16, 512)         -               
G_mapping/dlatents_out          -         (?, 16, 512)         -               
Truncation/Lerp                 -         (?, 16, 512)         -               
G_synthesis/dlatents_in         -         (?, 16, 512)         -               
G_synthesis/8x8/Const           32768     (?, 512, 8, 8)       (1, 512, 8, 8)  
G_synthesis/8x8/Conv            2622465   (?, 512, 8, 8)       (3, 3, 512, 512)
G_synthesis/8x8/ToRGB           264195    (?, 3, 8, 8)         (1, 1, 512, 3)  
G_synthesis/16x16/Conv0_up      2622465   (?, 512, 16, 16)     (3, 3, 512, 512)
G_synthesis/16x16/Conv1         2622465   (?, 512, 16, 16)     (3, 3, 512, 512)
G_synthesis/16x16/Upsample      -         (?, 3, 16, 16)       -               
G_synthesis/16x16/ToRGB         264195    (?, 3, 16, 16)       (1, 1, 512, 3)  
G_synthesis/32x32/Conv0_up      2622465   (?, 512, 32, 32)     (3, 3, 512, 512)
G_synthesis/32x32/Conv1         2622465   (?, 512, 32, 32)     (3, 3, 512, 512)
G_synthesis/32x32/Upsample      -         (?, 3, 32, 32)       -               
G_synthesis/32x32/ToRGB         264195    (?, 3, 32, 32)       (1, 1, 512, 3)  
G_synthesis/64x64/Conv0_up      2622465   (?, 512, 64, 64)     (3, 3, 512, 512)
G_synthesis/64x64/Conv1         2622465   (?, 512, 64, 64)     (3, 3, 512, 512)
G_synthesis/64x64/Upsample      -         (?, 3, 64, 64)       -               
G_synthesis/64x64/ToRGB         264195    (?, 3, 64, 64)       (1, 1, 512, 3)  
G_synthesis/128x128/Conv0_up    1442561   (?, 256, 128, 128)   (3, 3, 512, 256)
G_synthesis/128x128/Conv1       721409    (?, 256, 128, 128)   (3, 3, 256, 256)
G_synthesis/128x128/Upsample    -         (?, 3, 128, 128)     -               
G_synthesis/128x128/ToRGB       132099    (?, 3, 128, 128)     (1, 1, 256, 3)  
G_synthesis/256x256/Conv0_up    426369    (?, 128, 256, 256)   (3, 3, 256, 128)
G_synthesis/256x256/Conv1       213249    (?, 128, 256, 256)   (3, 3, 128, 128)
G_synthesis/256x256/Upsample    -         (?, 3, 256, 256)     -               
G_synthesis/256x256/ToRGB       66051     (?, 3, 256, 256)     (1, 1, 128, 3)  
G_synthesis/512x512/Conv0_up    139457    (?, 64, 512, 512)    (3, 3, 128, 64) 
G_synthesis/512x512/Conv1       69761     (?, 64, 512, 512)    (3, 3, 64, 64)  
G_synthesis/512x512/Upsample    -         (?, 3, 512, 512)     -               
G_synthesis/512x512/ToRGB       33027     (?, 3, 512, 512)     (1, 1, 64, 3)   
G_synthesis/1024x1024/Conv0_up  51297     (?, 32, 1024, 1024)  (3, 3, 64, 32)  
G_synthesis/1024x1024/Conv1     25665     (?, 32, 1024, 1024)  (3, 3, 32, 32)  
G_synthesis/1024x1024/Upsample  -         (?, 3, 1024, 1024)   -               
G_synthesis/1024x1024/ToRGB     16515     (?, 3, 1024, 1024)   (1, 1, 32, 3)   
G_synthesis/images_out          -         (?, 3, 1024, 1024)   -               
G_synthesis/noise0              -         (1, 1, 8, 8)         -               
G_synthesis/noise1              -         (1, 1, 16, 16)       -               
G_synthesis/noise2              -         (1, 1, 16, 16)       -               
G_synthesis/noise3              -         (1, 1, 32, 32)       -               
G_synthesis/noise4              -         (1, 1, 32, 32)       -               
G_synthesis/noise5              -         (1, 1, 64, 64)       -               
G_synthesis/noise6              -         (1, 1, 64, 64)       -               
G_synthesis/noise7              -         (1, 1, 128, 128)     -               
G_synthesis/noise8              -         (1, 1, 128, 128)     -               
G_synthesis/noise9              -         (1, 1, 256, 256)     -               
G_synthesis/noise10             -         (1, 1, 256, 256)     -               
G_synthesis/noise11             -         (1, 1, 512, 512)     -               
G_synthesis/noise12             -         (1, 1, 512, 512)     -               
G_synthesis/noise13             -         (1, 1, 1024, 1024)   -               
G_synthesis/noise14             -         (1, 1, 1024, 1024)   -               
images_out                      -         (?, 3, 1024, 1024)   -               
---                             ---       ---                  ---             
Total                           24885511                                       

D                     Params    OutputShape          WeightShape     
---                   ---       ---                  ---             
images_in             -         (?, 3, 1024, 1024)   -               
labels_in             -         (?, 0)               -               
1024x1024/FromRGB     128       (?, 32, 1024, 1024)  (1, 1, 3, 32)   
1024x1024/Conv0       9248      (?, 32, 1024, 1024)  (3, 3, 32, 32)  
1024x1024/Conv1_down  18496     (?, 64, 512, 512)    (3, 3, 32, 64)  
1024x1024/Skip        2048      (?, 64, 512, 512)    (1, 1, 32, 64)  
512x512/Conv0         36928     (?, 64, 512, 512)    (3, 3, 64, 64)  
512x512/Conv1_down    73856     (?, 128, 256, 256)   (3, 3, 64, 128) 
512x512/Skip          8192      (?, 128, 256, 256)   (1, 1, 64, 128) 
256x256/Conv0         147584    (?, 128, 256, 256)   (3, 3, 128, 128)
256x256/Conv1_down    295168    (?, 256, 128, 128)   (3, 3, 128, 256)
256x256/Skip          32768     (?, 256, 128, 128)   (1, 1, 128, 256)
128x128/Conv0         590080    (?, 256, 128, 128)   (3, 3, 256, 256)
128x128/Conv1_down    1180160   (?, 512, 64, 64)     (3, 3, 256, 512)
128x128/Skip          131072    (?, 512, 64, 64)     (1, 1, 256, 512)
64x64/Conv0           2359808   (?, 512, 64, 64)     (3, 3, 512, 512)
64x64/Conv1_down      2359808   (?, 512, 32, 32)     (3, 3, 512, 512)
64x64/Skip            262144    (?, 512, 32, 32)     (1, 1, 512, 512)
32x32/Conv0           2359808   (?, 512, 32, 32)     (3, 3, 512, 512)
32x32/Conv1_down      2359808   (?, 512, 16, 16)     (3, 3, 512, 512)
32x32/Skip            262144    (?, 512, 16, 16)     (1, 1, 512, 512)
16x16/Conv0           2359808   (?, 512, 16, 16)     (3, 3, 512, 512)
16x16/Conv1_down      2359808   (?, 512, 8, 8)       (3, 3, 512, 512)
16x16/Skip            262144    (?, 512, 8, 8)       (1, 1, 512, 512)
8x8/MinibatchStddev   -         (?, 513, 8, 8)       -               
8x8/Conv              2364416   (?, 512, 8, 8)       (3, 3, 513, 512)
8x8/Dense0            16777728  (?, 512)             (32768, 512)    
Output                513       (?, 1)               (512, 1)        
scores_out            -         (?, 1)               -               
---                   ---       ---                  ---             
Total                 36613665                                       

Building TensorFlow graph...
Initializing logs...
Training for 20000 kimg...

Running with: minibatch_size_in: 64.00 -- minibatch_gpu_in: 1.00

Running with gradient accumulation
Trying G_reg_op...

Traceback (most recent call last):
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: cudaErrorInvalidConfiguration
     [[{{node GPU0/G_loss/PathReg/G/G_synthesis/16x16/Upsample/UpFirDn2D}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_training.py", line 216, in <module>
    main()
  File "run_training.py", line 211, in main
    run(**vars(args))
  File "run_training.py", line 136, in run
    dnnlib.submit_run(**kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/training_loop.py", line 347, in training_loop
    tflib.run(G_reg_op, feed_dict)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/tfutil.py", line 31, in run
    return tf.get_default_session().run(*args, **kwargs)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cudaErrorInvalidConfiguration
     [[node GPU0/G_loss/PathReg/G/G_synthesis/16x16/Upsample/UpFirDn2D (defined at <string>:110) ]]

Errors may have originated from an input operation.
Input Source operations connected to node GPU0/G_loss/PathReg/G/G_synthesis/16x16/Upsample/UpFirDn2D:
 GPU0/G_loss/PathReg/G/G_synthesis/16x16/Upsample/Reshape (defined at /home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/ops/upfirdn_2d.py:358) 
 GPU0/G_loss/PathReg/G/G_synthesis/16x16/Upsample/Const (defined at /home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/ops/upfirdn_2d.py:123)

Original stack trace for 'GPU0/G_loss/PathReg/G/G_synthesis/16x16/Upsample/UpFirDn2D':
  File "run_training.py", line 216, in <module>
    main()
  File "run_training.py", line 211, in main
    run(**vars(args))
  File "run_training.py", line 136, in run
    dnnlib.submit_run(**kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/training_loop.py", line 256, in training_loop
    G_loss, G_reg = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_gpu_in, **G_loss_args)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/util.py", line 256, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/loss.py", line 164, in G_logistic_ns_pathreg
    fake_images_out, fake_dlatents_out = G.get_output_for(pl_latents, pl_labels, is_training=True, return_dlatents=True)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/network.py", line 221, in get_output_for
    out_expr = self._build_func(*final_inputs, **build_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/networks_stylegan2.py", line 238, in G_main
    images_out = components.synthesis.get_output_for(dlatents, is_training=is_training, force_clean_graph=is_template_graph, **kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/network.py", line 221, in get_output_for
    out_expr = self._build_func(*final_inputs, **build_kwargs)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/networks_stylegan2.py", line 544, in G_synthesis_stylegan2
    y = upsample(y)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/training/networks_stylegan2.py", line 499, in upsample
    return upsample_2d(y, k=resample_kernel)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/ops/upfirdn_2d.py", line 198, in upsample_2d
    return _simple_upfirdn_2d(x, k, up=factor, pad0=(p+1)//2+factor-1, pad1=p//2, data_format=data_format, impl=impl)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/ops/upfirdn_2d.py", line 359, in _simple_upfirdn_2d
    y = upfirdn_2d(y, k, upx=up, upy=up, downx=down, downy=down, padx0=pad0, padx1=pad1, pady0=pad0, pady1=pad1, impl=impl)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/ops/upfirdn_2d.py", line 62, in upfirdn_2d
    return impl_dict[impl](x=x, k=k, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/ops/upfirdn_2d.py", line 140, in _upfirdn_2d_cuda
    return func(x)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 162, in decorated
    return _graph_mode_decorator(f, *args, **kwargs)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/ops/custom_gradient.py", line 183, in _graph_mode_decorator
    result, grad_fn = f(*args)
  File "/home/rednax/Desktop/music_vr/StyleGAN_training/stylegan2/dnnlib/tflib/ops/upfirdn_2d.py", line 132, in func
    y = _get_plugin().up_fir_dn2d(x=x, k=kc, upx=upx, upy=upy, downx=downx, downy=downy, padx0=padx0, padx1=padx1, pady0=pady0, pady1=pady1)
  File "<string>", line 110, in up_fir_dn2d
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/home/rednax/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()
``
aiXander commented 4 years ago

Ok, turns out this error is actually caused by setting sched.minibatch_gpu_base = 1 in run_training.py. At first I was getting OOM errors, so I lowered the gpu_base, but if you set it to 1 you suddenly get this weird and uninformative error..