harvitronix / neural-network-genetic-algorithm

Evolving a neural network with a genetic algorithm.
https://medium.com/@harvitronix/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164
MIT License
693 stars 237 forks source link

ResourceExhaustedError when running main.py #4

Closed ucsky closed 6 years ago

ucsky commented 6 years ago

I got the following error when trying to run main.py on a TITAN X (Pascal).

2017-12-06 18:55:58.886236: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[3072,768]
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3072,768]
     [[Node: mul_168 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](mul_162/x, Variable_75/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 113, in <module>
    main()
  File "main.py", line 110, in main
    generate(generations, population, nn_param_choices, dataset)
  File "main.py", line 62, in generate
    train_networks(networks, dataset)
  File "main.py", line 23, in train_networks
    network.train(dataset)
  File "/home/bob/test/neural-network-genetic-algorithm/network.py", line 48, in train
    self.accuracy = train_and_score(self.network, dataset)
  File "/home/bob/test/neural-network-genetic-algorithm/train.py", line 120, in train_and_score
    callbacks=[early_stopper])
  File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 870, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1507, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1156, in _fit_loop
    outs = f(ins_batch)
  File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2269, in __call__
    **self.session_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3072,768]
     [[Node: mul_168 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](mul_162/x, Variable_75/read)]]

Caused by op 'mul_168', defined at:
  File "main.py", line 113, in <module>
    main()
  File "main.py", line 110, in main
    generate(generations, population, nn_param_choices, dataset)
  File "main.py", line 62, in generate
    train_networks(networks, dataset)
  File "main.py", line 23, in train_networks
    network.train(dataset)
  File "/home/bob/test/neural-network-genetic-algorithm/network.py", line 48, in train
    self.accuracy = train_and_score(self.network, dataset)
  File "/home/bob/test/neural-network-genetic-algorithm/train.py", line 120, in train_and_score
    callbacks=[early_stopper])
  File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 870, in fit
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1490, in fit
    self._make_train_function()
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1014, in _make_train_function
    self.total_loss)
  File "/usr/local/lib/python3.5/dist-packages/keras/optimizers.py", line 364, in get_updates
    new_d_a = self.rho * d_a + (1 - self.rho) * K.square(update)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 754, in _run_op
    return getattr(ops.Tensor, operator)(a._AsTensor(), *args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 910, in r_binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 1117, in _mul_dispatch
    return gen_math_ops._mul(x, y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 2726, in _mul
    "Mul", x=x, y=y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[3072,768]
     [[Node: mul_168 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](mul_162/x, Variable_75/read)]]

Any idea on how to fix it?

jsalatas commented 6 years ago

seems like out-of-memory

tuchao1996 commented 6 years ago

Do you solve this problem, I also met it. May you tell me, ths

harvitronix commented 6 years ago

Yeah this is caused by running out of GPU memory. Update the parameters in your search space to make the largest network smaller. In main.py, change the max nb_neurons to 512or the maxnb_layers` to 2 or 3.