tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.23k stars 1.09k forks source link

Got error when trying to implement multi_gpu from Keras #531

Open Saladino93 opened 5 years ago

Saladino93 commented 5 years ago

Hi all,

I the following network

`import tensorflow as tf

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, LSTM, Bidirectional, TimeDistributed

from tensorflow.keras.utils import to_categorical, multi_gpu_model

import tensorflow_probability as tfp

.....

neural_net = tf.keras.Sequential([ (LSTM(num_units, activation = tf.nn.tanh, input_shape = (input_shape), return_sequences = True)), Dropout(0.2), Bidirectional(LSTM(num_units, activation = tf.nn.tanh)), Dropout(0.2), tfp.layers.DenseFlipout(84, activation = tf.nn.relu), tfp.layers.DenseFlipout(n_classes_binary, activation = tf.nn.softmax) ])`

and I am want to implement multi gpu support.

If I follow tf recommendation, to share weights on cpu, with

multi_neural_net = tf.keras.utils.multi_gpu_model(neural_net, gpus = gpu_n, cpu_relocation = True)

I get

`--------------------------------------------------------------------------- NameError Traceback (most recent call last)

in ----> 1 neural_net = tf.keras.utils.multi_gpu_model(neural_net, gpus=2, cpu_relocation=True) ~/.local/lib/python3.5/site-packages/tensorflow/python/keras/utils/multi_gpu_utils.py in multi_gpu_model(model, gpus, cpu_merge, cpu_relocation) 210 from tensorflow.python.keras.models import clone_model # pylint: disable=g-import-not-at-top 211 with ops.device('/cpu:0'): --> 212 model = clone_model(model) 213 214 all_outputs = [] ~/.local/lib/python3.5/site-packages/tensorflow/python/keras/models.py in clone_model(model, input_tensors) 267 """ 268 if isinstance(model, Sequential): --> 269 return _clone_sequential_model(model, input_tensors=input_tensors) 270 else: 271 return _clone_functional_model(model, input_tensors=input_tensors) ~/.local/lib/python3.5/site-packages/tensorflow/python/keras/models.py in _clone_sequential_model(model, input_tensors) 212 if input_tensors is None: 213 layers = [clone(layer) for layer in model._layers] --> 214 return Sequential(layers=layers, name=model.name) 215 else: 216 # If input tensors are provided, the original model's InputLayer is ~/.local/lib/python3.5/site-packages/tensorflow/python/training/checkpointable/base.py in _method_wrapper(self, *args, **kwargs) 440 self._setattr_tracking = False # pylint: disable=protected-access 441 try: --> 442 method(self, *args, **kwargs) 443 finally: 444 self._setattr_tracking = previous_value # pylint: disable=protected-access ~/.local/lib/python3.5/site-packages/tensorflow/python/keras/engine/sequential.py in __init__(self, layers, name) 107 if layers: 108 for layer in layers: --> 109 self.add(layer) 110 111 @property ~/.local/lib/python3.5/site-packages/tensorflow/python/training/checkpointable/base.py in _method_wrapper(self, *args, **kwargs) 440 self._setattr_tracking = False # pylint: disable=protected-access 441 try: --> 442 method(self, *args, **kwargs) 443 finally: 444 self._setattr_tracking = previous_value # pylint: disable=protected-access ~/.local/lib/python3.5/site-packages/tensorflow/python/keras/engine/sequential.py in add(self, layer) 178 # If the model is being built continuously on top of an input layer: 179 # refresh its output. --> 180 output_tensor = layer(self.outputs[0]) 181 if isinstance(output_tensor, list): 182 raise TypeError('All layers in a Sequential model ' ~/.local/lib/python3.5/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs) 536 if not self.built: 537 # Build layer if applicable (if the `build` method has been overridden). --> 538 self._maybe_build(inputs) 539 # We must set self.built since user defined build functions are not 540 # constrained to set self.built. ~/.local/lib/python3.5/site-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs) 1601 # Only call `build` if the user has manually overridden the build method. 1602 if not hasattr(self.build, '_is_default'): -> 1603 self.build(input_shapes) 1604 1605 def __setattr__(self, name, value): ~/.local/lib/python3.5/site-packages/tensorflow_probability/python/layers/dense_variational.py in build(self, input_shape) 141 self.kernel_posterior = self.kernel_posterior_fn( 142 dtype, [in_size, self.units], 'kernel_posterior', --> 143 self.trainable, self.add_variable) 144 145 if self.kernel_prior_fn is None: ~/.local/lib/python3.5/site-packages/tensorflow/python/keras/utils/generic_utils.py in _fn(dtype, shape, name, trainable, add_variable_fn) 188 dist = tfd.Deterministic(loc=loc) 189 else: --> 190 dist = tfd.Normal(loc=loc, scale=scale) 191 batch_ndims = tf.size(dist.batch_shape_tensor()) 192 return tfd.Independent(dist, reinterpreted_batch_ndims=batch_ndims) NameError: name 'tfd' is not defined ` But if I use `multi_neural_net = tf.keras.utils.multi_gpu_model(neural_net, gpus = gpu_n, cpu_relocation = False)` then no errors(although I do not know if multi gpus is working properly). Is this a problem of tfp or is it related to my code? For the training part I am following the [bayesian_neural_network](https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/bayesian_neural_network.py) example.
brianwa84 commented 4 years ago

Sorry for no reply. tf.distribute.Strategy also offers a solution for multiple GPU, which might be worth a shot.

There is a setting under tf.config that prints out device assignments as ops get scheduled. That could be useful for proving to yourself that it's using all the GPUs.