keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.63k stars 19.42k forks source link

Understanding Multi-GPU Training in Keras #3258

Closed pengpaiSH closed 7 years ago

pengpaiSH commented 8 years ago

@fchollet has provided an excellent blog about using Keras as a simplified interface for TensorFlow. In the end of that blog, it introduces the way to use multiple GPUs to train a model.

with tf.device('/cpu:0'):
    x = tf.placeholder(tf.float32, shape=(None, 784))

    # shared model living on CPU:0
    # it won't actually be run during training; it acts as an op template
    # and as a repository for shared variables
    model = Sequential()
    model.add(Dense(32, activation='relu', input_dim=784))
    model.add(Dense(10, activation='softmax'))

# replica 0
with tf.device('/gpu:0'):
    output_0 = model(x)  # all ops in the replica will live on GPU:0

# replica 1
with tf.device('/gpu:1'):
    output_1 = model(x)  # all ops in the replica will live on GPU:1

# merge outputs on CPU
with tf.device('/cpu:0'):
    preds = 0.5 * (output_0 + output_1)

# we only run the `preds` tensor, so that only the two
# replicas on GPU get run (plus the merge op on CPU)
output_value = sess.run([preds], feed_dict={x: data})

What I am confused is that why do we have to do model averaging in cpu0? My understanding of using multiple GPUs is that Keras could automatically train the model by computing gradients and updating weights more quickly compared with only one replica. In other words, the learning convergence should be more fast. Please correct me if I am wrong. And if Keras could handle such automatic multi-GPU learning fashion, what is the simplest way to implement it (perhaps in just a few lines of codes) ?

tboquet commented 8 years ago

This part of the blog post is about model parallelism, data parallelism is discussed in other issues:

3174, #3240

You could find a concise description here.

pengpaiSH commented 8 years ago

@tboquet Thank you for your provided materials which me clear about the two parallelism fashions
in ConvNets. So, currently, the data parallelism in Keras (I mean the simplified APIs) is still ongoing, right?

tboquet commented 8 years ago

Right! You could use model parallelism with Tensorflow but there is no unified Keras api to do this. For data parallelism, you could take a look at https://github.com/mila-udem/platoon if you want to get inspiration to develop your own solution.

rollingstone commented 8 years ago

@fchollet Thanks for the example for multi gpu in tensorflow,

I have a 2 Tinan X PC

I tried the code

with tf.device('/cpu:0'):
    x = tf.placeholder(tf.float32, shape=(None, 784))

    model = Sequential()
    model.add(Dense(32, activation='relu', input_dim=784))
    model.add(Dense(10, activation='softmax'))

with tf.device('/gpu:0'):
    output_0 = model(x)  # all ops in the replica will live on GPU:0

with tf.device('/gpu:1'):
    output_1 = model(x)  # all ops in the replica will live on GPU:1

with tf.device('/cpu:0'):
    preds = 0.5 * (output_0 + output_1)

output_value = sess.run([preds], feed_dict={x: data})

However, if I print

print output_0
print output_1

the output gives

Tensor("Softmax_28:0", shape=(?, 10), dtype=float32, device=/device:GPU:0) Tensor("Softmax_28:0", shape=(?, 10), dtype=float32, device=/device:GPU:0)

It seems that only the first device scope is active and only one GPU is used

Obviously, I am missing something. Any help would be appreciated

tetmin commented 8 years ago

It's not clear how to actually run the above examples with multi-gpu, do we call keras preds.fit instead of the usual model.fit?

Nqabz commented 8 years ago

@rollingstone: I am experiencing the same with that example. My devices appear to be executing sequentially?

Any pointers on why this is happening?

pengpaiSH commented 8 years ago

@fchollet and other Keras fans: Does Keras support data parallelism (with TensorFlow as backend) right now? I have one machine with 4 GPUs, I would like to do data parallelism which makes convergence faster, i.e. batch_size could be set larger.

jonilaserson commented 8 years ago

ping

On Sat, Aug 20, 2016 at 5:47 AM, Pai Peng notifications@github.com wrote:

@fchollet https://github.com/fchollet and other Keras fans: Does Keras support data parallelism (with TensorFlow as backend) right now? I have one machine with 4 GPUs, I would like to do data parallelism which makes convergence faster, i.e. batch_size could be set larger.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/3258#issuecomment-241174390, or mute the thread https://github.com/notifications/unsubscribe-auth/AFdLCONqQl3L0G8Xyohgh6ahXQS8I_kwks5qhmqsgaJpZM4JPqbP .

kaigang commented 7 years ago

@rollingstone I'm experience the same problem. Any update?

pengpaiSH commented 7 years ago

@kaigang I am also expecting update!

mongoose54 commented 7 years ago

Any updates?

pengpaiSH commented 7 years ago

@mongoose54 No updates any more. @fchollet has already confirmed that fit_distributed won't appear in the next version of Keras. However, the good news is that TensorFlow will officially support Keras since Version 1.2. This video shows how to use tf.keras to train a VQA model and it claims that distributed fashion is no more a concern.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.