skyflynil / stylegan2

StyleGAN2 - Official TensorFlow Implementation with practical improvements
http://arxiv.org/abs/1912.04958
Other
120 stars 33 forks source link

Memory Error after 2nd tick #15

Open rmbwalsh opened 4 years ago

rmbwalsh commented 4 years ago

Hi, congrats on this code! I'm training a model with 768x1280 dataset with this command:

nohup python run_training.py --num-gpus=1 --data-dir=./dataset --config=config-f --dataset=stainedglass1 --mirror-augment=true --metric=none --total-kimg=20000 --min-h=5 --min-w=3 --res-log2=8

I'm then getting this error after the second tick running this fork:

Traceback (most recent call last): File "run_training.py", line 218, in <module> main() File "run_training.py", line 213, in main run(**vars(args)) File "run_training.py", line 136, in run dnnlib.submit_run(**kwargs) File "/home/rmbwalsh/stylegan-skyflynil/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "/home/rmbwalsh/stylegan-skyflynil/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit return run_wrapper(submit_config) File "/home/rmbwalsh/stylegan-skyflynil/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper run_func_obj(**submit_config.run_func_kwargs) File "/home/rmbwalsh/stylegan-skyflynil/stylegan2/training/training_loop.py", line 349, in training_loop grid_fakes = Gs.run(grid_latents, grid_labels, is_validation=True, minibatch_size=sched.minibatch_gpu) File "/home/rmbwalsh/stylegan-skyflynil/stylegan2/dnnlib/tflib/network.py", line 433, in run out_arrays = [np.empty([num_items] + expr.shape.as_list()[1:], expr.dtype.name) for expr in out_expr] File "/home/rmbwalsh/stylegan-skyflynil/stylegan2/dnnlib/tflib/network.py", line 433, in <listcomp> out_arrays = [np.empty([num_items] + expr.shape.as_list()[1:], expr.dtype.name) for expr in out_expr] MemoryError

I also got this error on another attempt:

MemoryError: Unable to allocate 450. MiB for an array with shape (40, 3, 1280, 768) and data type float32

Anyone had anything similar?

Oranging1 commented 4 years ago

I've met the same problem, have you solved it?

rmbwalsh commented 4 years ago

Yes. I was running this model on a Google Cloud GPU - It need the CPU RAM to be quite a bit larger than the GPU RAM. I was running a 16gb RAM GPU and I changed my virtual machine's CPU config to have 30gb of CPU RAM. The model ran fine then.