rasbt / machine-learning-book

Code Repository for Machine Learning with PyTorch and Scikit-Learn
https://sebastianraschka.com/books/#machine-learning-with-pytorch-and-scikit-learn
MIT License
3.63k stars 1.31k forks source link

Chapter 14, Dropout after pooling layers in smile classification resulted in worse accuracy #191

Closed alvinng4 closed 3 months ago

alvinng4 commented 3 months ago

In Chapter 14: Classifying Images with Deep Convolutional Neural Networks (Part 2/2), dropout layers are applied after the first two pooling layers with p=0.5:

>>> model
Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_
mode=False)
  (dropout1): Dropout(p=0.5, inplace=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu2): ReLU()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_
mode=False)
  (dropout2): Dropout(p=0.5, inplace=False)
  (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu3): ReLU()
  (pool3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_
mode=False)
  (conv4): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu4): ReLU()
  (pool4): AvgPool2d(kernel_size=8, stride=8, padding=0)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

However, I have read online that the dropout probability on convolutional layers should not be that high. In fact, after lowering p from p=0.5 to p=0.15, I achieved a much better result.

Epoch 1 accuracy: 0.5138 val_accuracy: 0.5176
...
Epoch 15 accuracy: 0.8767 val_accuracy: 0.8960
...
Epoch 30 accuracy: 0.9054 val_accuracy: 0.9104

Test accuracy: 0.9075
alvinng4 commented 3 months ago

After some testing, I found that using BatchNorm2D and Dropout2d(p=0.2) yields the best test results. It achieved 0.91 accuracy with only 15 epochs

rasbt commented 3 months ago

Thanks for sharing. And yes, I agree, 0.5 is a bit high. I think I used that because it's the "classic" value from the dropout paper, but usually you don't need to (or shouldn't) go much higher than 0.1-0.3. I'll make a note to update that in case there's ever a 2nd edition one day. Thanks!