keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.71k stars 19.43k forks source link

Slow BatchNormalization layers (CPU tested) #1309

Closed Zebreu closed 7 years ago

Zebreu commented 8 years ago

Hi,

I added BatchNormalization layers to my model and it suddenly took much more time to train.

It takes 561 seconds for one epoch with them: Epoch 1/1 4096/4096 [==============================] - 561s - loss: 0.0946 - acc: 0.9006

This is if I comment out the BatchNorm layers (186 seconds): Epoch 1/1 4096/4096 [==============================] - 186s - loss: 4.5043 - acc: 0.5933

I wrote on the user group and someone informed me that adding Dropout slows it down further, but even without Dropout this is much slower (above 500 seconds too).

Is it a CPU-related issue? Or is such a slowdown expected? Below is my model.

    model = Sequential()
    model.add(Convolution2D(32, 11, 11, subsample=(4,4),input_shape=(3,227,227)))
    model.add(PReLU())
    model.add(BatchNormalization())

    model.add(Convolution2D(64, 5, 5, subsample=(2,2)))
    model.add(PReLU())
    model.add(BatchNormalization())

    model.add(Convolution2D(64, 3, 3))
    model.add(PReLU())
    #model.add(BatchNormalization())
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Flatten())
    model.add(BatchNormalization())
    model.add(Dense(400))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Dropout(0.25))

    model.add(Dense(400))
    model.add(PReLU())
    model.add(BatchNormalization())
    model.add(Dropout(0.25))

    model.add(Dense(3, activation='softmax'))
    optimizer = keras.optimizers.RMSprop(lr=0.0005, rho=0.9, epsilon=1e-6)

    model.compile(loss='categorical_crossentropy',optimizer=optimizer)
keunwoochoi commented 8 years ago

Not sure if it's relevant but should BN be applied before activations?

kmul00 commented 8 years ago

Hi,

Any update on this ?

The BatchNorm is Keras is still tremendously slow. And not just on CPU, but on GPU as well. Is there any way to alleviate the problem ?

And it increases linearly with batch size. So even if my GPU is under utilized, I cannot increase my batch size, since that would make it even more slower.

iaroslav-ai commented 8 years ago

+1 I observe same thing

nouiz commented 8 years ago

In Theano, we juste merged a wrapper of cuDNN batch norm that must be used manually:

http://deeplearning.net/software/theano_versions/dev/library/sandbox/cuda/dnn.html#batch-normalization

It speed up the compilation and execution time.

So if you or someone else modify keras to use it, it would be great.

On Thu, Jul 14, 2016 at 12:57 PM, iaroslav-ai notifications@github.com wrote:

+1 I observe same thing

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/1309#issuecomment-232726054, or mute the thread https://github.com/notifications/unsubscribe/AALC-3IwuQ6xSPdXSJRiNkyk6zBBgWoCks5qVmqBgaJpZM4G4bYz .

EderSantana commented 8 years ago

Nice @nouiz !!!

@fchollet @farizrahman4u tensorflow also has batch norm by default, should we benchmark this and create a batch_norm operator in backend? Are both implementations compatible?

farizrahman4u commented 8 years ago

If its faster, then we defenitely should.

ozankabak commented 8 years ago

I just observed almost a 10x slowdown when using batch normalization using GPUs. I am using the Theano backend, CUDA 8 RC and CuDNN. I am training a VGG-style CNN, on CIFAR-10, that is around 20 layers deep.

mdering commented 8 years ago

Has anyone attempted this? I would like to use it in my implementation (though it looks like the theano cudnn BN doesnt support 5 dimensional data yet)

dk1013 commented 8 years ago

Have encountered the same problem with Theano backend + GPU (980 Ti). At least two times slower than models without BatchNormalization.

nouiz commented 8 years ago

Batch norm have a computation cost. Be sure to use cudnn 5.1. In particular for Pascal GPU.

I don't know if Keras finished to wrap Theano cudnn batch norm or not. Make sure that it use it.

On Sat, Sep 24, 2016 at 8:18 PM, dk1013 notifications@github.com wrote:

Have encountered the same problem with Theano backend + GPU (980 Ti). At least two times slower than models without BatchNormalization.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/1309#issuecomment-249394900, or mute the thread https://github.com/notifications/unsubscribe-auth/AALC-wsycDfppe9hsNyOF4w7L_C3C7k6ks5qtb3sgaJpZM4G4bYz .

farizrahman4u commented 8 years ago

Question : Is batchnorm actually worth all this trouble?

EderSantana commented 8 years ago

@farizrahman4u I really wish it was not... but at least for GANs, it is mandatory.

fchollet commented 8 years ago

Yes Keras wraps cuDNN batch norm when available on the Theano side.

You could replace BN with gradient normalization or activation normalization.

On Sep 27, 2016 12:44 PM, "Eder Santana" notifications@github.com wrote:

@farizrahman4u https://github.com/farizrahman4u I really wish it was not... but at least for GANs, it is mandatory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/1309#issuecomment-249976357, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb3IDz_SiTGYyIgmCvL6F5wo4i6lvks5quXI5gaJpZM4G4bYz .

dk1013 commented 8 years ago

Update: BatchNormalization takes an "axis" parameter (default to 0). This may cause the different speed between Theano and Tensorflow. ############ old ############# @nouiz Have updated my cudnn from v5005 to v5103, but observed no performance gain. Is cudnn important for BatchNormalization? I also tested the same model with tensorflow backend (after adjusting image_dim_ordering conventions etc) and found tensorflow runs way faster than theano. Edit1: Using 980 Ti, haven't tried pascal yet. Edit2: To make it quatitative, one epoch takes about 500s with Tensorflow, 2000s with Theano.

jiqiujia commented 7 years ago

I wonder if there is any progress on this issue?

nouiz commented 7 years ago

Yes. Update Theano to the dev version and a "recent" enought keras will use the new Theano interface that should have this fast from Theano side. So it work on the CPU and the GPU with and without CuDNN. If CuDNN is there, it will be faster.

I'm not sure which recent enough mean for keras. For Theano the master of Theano 0.9beta1 is recent enough.

On Tue, Feb 14, 2017 at 8:58 AM jiqiujia notifications@github.com wrote:

I wonder if there is any progress on this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/1309#issuecomment-279713703, or mute the thread https://github.com/notifications/unsubscribe-auth/AALC-_ydEJPYtJuylD-xk3R1SD52b2yoks5rcbMXgaJpZM4G4bYz .