pythonlessons / mltu

Machine Learning Training Utilities (for TensorFlow and PyTorch)
MIT License
160 stars 100 forks source link

Dropout with Batch Normalization Disharmony #41

Closed sondosaabed closed 6 months ago

sondosaabed commented 6 months ago

I have read somewhere that using Dropout and Batch Normalization together leads to a worse performance. I have noticed in your code that you do that. What is your opinion and your experience on this?

pythonlessons commented 6 months ago

Using Dropout and Batch Normalization together in a neural network is a common practice and is not inherently problematic. In fact, these techniques serve different purposes and can complement each other in improving the training and generalization performance of a model.

Using both Dropout and Batch Normalization together is often beneficial because:

However, the effectiveness of these techniques can depend on the specific characteristics of the data and the architecture of your neural network. It's always a good idea to experiment with different combinations and hyperparameters to find the optimal configuration for your specific task.

In some cases, we might find that using both Dropout and Batch Normalization might not provide significant benefits or could even be counterproductive. This is why it's common practice to experiment with different configurations and perform model evaluation based on your specific use case and dataset.

It would be interesting to see your source, but probably it should tell us similar stuff

sondosaabed commented 6 months ago

Thank you for your answer and explanation!

In the case that happened, we were training a model on handwriting images and we noticed that it performed better without the dropout and they had a bad effect on the training.

At the same time, I was taking a course with DataCamp "Image Modeling with Keras" and when they introduced regularization and batch normalization they talked about this issue. https://github.com/sondosaabed/Image-Modeling-with-Keras/ (Theory/Chapter 4).

image

So when I came to this on the course I went and searched for it and found this paper but didn't finish reading it yet https://arxiv.org/abs/1801.05134

That's why I was curious to know your answer based on your experience and these were the only resources I looked into the matter.

pythonlessons commented 6 months ago

Nice paper, thanks. It talks about "Dropout layers are applied before Batch Normalization". To my knowledge, dropout is usually used after BN, but I may be wrong. If you got better results without dropout, thanks great. That's why I said that each case might be different.