jiecaoyu / pytorch-nin-cifar10

pytorch implementation of network-in-network model on cifar10
51 stars 11 forks source link

Test accuracy stays the same #1

Closed hyojinie closed 6 years ago

hyojinie commented 6 years ago

Hello, I ran original.py according to the instuctions, but the test accuracy & loss stays the same when I ran it through 10 epochs. Do you know what could be the reason?

Train Epoch: 1 [0/50000 (0%)] Loss: 2.303388 LR: 0.1 Train Epoch: 1 [12800/50000 (26%)] Loss: 2.289754 LR: 0.1 Train Epoch: 1 [25600/50000 (51%)] Loss: 2.233992 LR: 0.1 Train Epoch: 1 [38400/50000 (77%)] Loss: 2.302584 LR: 0.1

Test set: Average loss: 2.9473, Accuracy: 1000/10000 (10.00%)

Train Epoch: 2 [0/50000 (0%)] Loss: 2.302585 LR: 0.1 Train Epoch: 2 [12800/50000 (26%)] Loss: 2.302585 LR: 0.1 Train Epoch: 2 [25600/50000 (51%)] Loss: 2.302585 LR: 0.1 Train Epoch: 2 [38400/50000 (77%)] Loss: 2.302585 LR: 0.1

Test set: Average loss: 2.9473, Accuracy: 1000/10000 (10.00%)

Train Epoch: 3 [0/50000 (0%)] Loss: 2.302585 LR: 0.1 Train Epoch: 3 [12800/50000 (26%)] Loss: 2.302585 LR: 0.1 Train Epoch: 3 [25600/50000 (51%)] Loss: 2.302585 LR: 0.1 Train Epoch: 3 [38400/50000 (77%)] Loss: 2.302585 LR: 0.1

Test set: Average loss: 2.9473, Accuracy: 1000/10000 (10.00%)

Train Epoch: 4 [0/50000 (0%)] Loss: 2.302585 LR: 0.1 Train Epoch: 4 [12800/50000 (26%)] Loss: 2.302585 LR: 0.1 Train Epoch: 4 [25600/50000 (51%)] Loss: 2.302585 LR: 0.1 Train Epoch: 4 [38400/50000 (77%)] Loss: 2.302585 LR: 0.1

hyojinie commented 6 years ago

To follow up with this, I ran it exactly the same way once again just to give another shot. Then it started working. Have you ever experienced such situation / know why this happens?

jiecaoyu commented 6 years ago

@hyojinie Hi, can you try using a lower learning rate like 0.01? 0.1 sometimes works but sometimes not. It is related to the initialization.

You can easily change the learning rate by changing the line here.

jiecaoyu commented 6 years ago

0.03 might be better if the training coverages successfully.

hyojinie commented 6 years ago

Thanks a lot for your response. I recognize it is due to initialization as well. Would lowering learning rate still help when initialization is not favorable? My friends suggest using batch normalization to be less sensitive to the initialization. I might try that. Thanks a lot for the help.

jiecaoyu commented 6 years ago

@hyojinie Yes, batch normalization should help a lot and the accuracy can be increased to around 91% (I cannot remember clearly but it should be around 91.3% something). For initialization which is not favorable, without batch normalization, a typical strategy is to use lower learning rate to "warm up" the training and then switch to a higher learning rate.

hyojinie commented 6 years ago

Cool. Thanks a lot for the helpful insights.

hyojinie commented 6 years ago

I wonder if 89.64% was achieved with learning rate 0.03? I tried it twice, but the accuracy was around 88% and 88.90% with batch normalization.