keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.91k stars 19.45k forks source link

Why there is 'batch-size' option in the model.evaluate() #3027

Closed amityaffliction closed 8 years ago

amityaffliction commented 8 years ago

Hi by looking at the example from keras.io 'Getting started at 30 second'

classes = model.predict_classes(X_test, batch_size=32) proba = model.predict_proba(X_test, batch_size=32)

Why there is batch_size argument?

is it for validation set evaluation in the training process?

mbollmann commented 8 years ago

For the same reason it exists everywhere else: so the data (in this case, X_test) is fed into the model in batches of the given size. What exactly is unclear to you?

jskDr commented 8 years ago

Probably similar to Seung, I was confused with a requirement of batch_size in the prediction because the requirement is of no use in the batch type machine learning methods such as MLR, SVM, GP but needed in the stochastic gradient based machine learning methods such as single layer ANN, shallow NN and DNN.

In line with the question, is there any simple algorithm to guess the best batch size? I am mainly wondering about it regarding computation speed at the prediction moment where no performance variation is expected depending on a batch size. The answer would be 10%, 20% or all of a test data. The aim of the guess is to give a reasonable default value of the batch_size so that general users do not need to consider about it when they try to predict their new data from the trained model developed by an expert.

-James

On Mon, Jun 20, 2016 at 12:54 PM Marcel Bollmann notifications@github.com wrote:

For the same reason it exists everywhere else: so the data (in this case, X_test) is fed into the model in batches of the given size. What exactly is unclear to you?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/3027#issuecomment-227201563, or mute the thread https://github.com/notifications/unsubscribe/ACchnSUC1ErqTf0xWRUhHacC9BsUykg0ks5qNsW-gaJpZM4I5ipm .

Sungjin (James) Kim, PhD

Postdoc, CCB in Harvard

sungjinkim@fas.harvard.edu

[Web] http://aspuru.chem.harvard.edu/james-sungjin-kim/

[Linkedin] https://www.linkedin.com/in/jamessungjinkim

[Facebook] https://www.facebook.com/jamessungjin.kim

[alternative email] jamessungjin.kim@gmail.com

amityaffliction commented 8 years ago

when testing normal dense MLP or deep conv nets, there is no need for batch_size doesn't it?

so the model generates softmax probability of images in deep conv nets.

Argument batch_size first gave me impression that testing only goes within the batch

so it wouldn't work if there is less data number than the batch_size.

but as far as I know classifier is just matrix multiplication and must be capable of calculate

inputs regardless of data number that is fed to it.

I'm new to keras and other DL packages like MatConvNet didn't require

something like batch_size in the testing time.

I know it is needed in the training time for the SGD batch learning.

But I just want to know why we have to explicitly feed batch_size argument to the evaluate function.

does change of 'batch_size' argument affect evaluate function?


For example, Here is what i thought what it should be

Trained_model = SomeTrainingProcess() Y = Trained_model(X)

(Dimension : X=(None,784) , Y= (None,10))) So that Trained_model can predict any number of 784 vector instances.

But Keras requires 'batch_size' field. In the doc it says for evaluate function

evaluate(self, x, y, batch_size=32, verbose=1, sample_weight=None) _batchsize: integer. Number of samples per gradient update.

which doesn't make sense because evaluate function doesn't involve any gradient update.

please someone help me understand... I'm haunted by it I think keras will appear in my dream T_T

mbollmann commented 8 years ago

Okay, I can see now where your confusion is coming from...

For one, some models in Keras require a fixed batch size even during prediction/evaluation. This is true for stateful RNNs, for example, since they keep information from one batch to the next (and the position of a sample within a batch is important!). According to the docs it's also the case for RNNs with Dropout when using the Tensorflow backend, presumably for technical reasons in the implementation.

If this doesn't apply to your model, it's true that setting a batch_size isn't technically needed. I believe it can affect the runtime performance of your model though, since it's basically controlling how much data is fed to your GPU at a time (= in one batch).

This is how I understand things at least.

philipperemy commented 8 years ago

Have a look at this if you're not convinced http://philipperemy.github.io/keras-stateful-lstm/

veltzerdoron commented 8 years ago

The batch size is both important for training results (it has a strong effect on the network's training memorization(overfitting)/generalization ratio) and on the networks optimized code generated before the training starts (the large amount of time it takes to call fit for the first time), for instance statefulness operates across batches (this same optimized code must then be used for testing, I think using a different batch size will cause an exception though I'm not sure if I ever checked this). So there you go, in any case, you wouldn't want your model to use different configuration in test and train, it might effect the results in a very unpredictable manner.

wjaskowski commented 8 years ago

I was also confused by the batch_size here. But more by the fact that the default value (32) was making my model evaluate very slowly. I might be wrong, but, at least for ffnn on GPU, for the evaluation purposes, the larger the batch_size the better, so I expected it to default to the size of the whole data set. At least, an option batch_size=None, could be handy.

mbollmann commented 8 years ago

@veltzerdoron I use different batch sizes for training and testing all the time, and since I don't use stateful models, the results are predictably the same no matter what the batch size during testing is.

@wjaskowski I think that depends on the size of your model and your data. The GPU memory has to be large enough to process the full batch at the same time, so defaulting to the size of the dataset might actually just crash when working with very large datasets.

tboggs commented 7 years ago

@wjaskowski @mbollmann I don't know whether the GPU would ultimately crash but a larger batch size is not always better. I recently ran some image classification performance tests to measure throughput (images/sec) vs. batch size. Starting with a batch size of one, increasing the batch size increased image throughput until a batch size of about 32, at which point throughput suddenly dropped by ~25%. Throughput then increased with increasing batch size until a batch size of 64, at which point it dropped again, then increased (though in a much less monotonic manner). Same situation again at batch size of 96. I'm just guessing here but it seems that there was some buffering being done and data was being moved in chunks of 32 images.

The basic take-away for my model/GPU combination was to use a batch size of 32 images, which provided the greatest throughput and relatively low variability.

saintthor commented 5 years ago

with a none RNN model, to run evaluate with different batch_size may get different results.

how to get the simplest mean loss value?

malharjajoo commented 4 years ago

This is definitely something that confused me as well. And people can't even explain it in a simple manner making it worse.