Closed mw66 closed 7 years ago
Have you used Adam optimizer, with learning rate of 1e-3 for 1st 100 epochs, 1e-4 for next 100 and 1-e5 for last 50 epochs?
That is how I trained the DenseNet model for CIFAR 10. Also, you need the far larger DenseNet-28-8 or DenseNet-28-10 for CIFAR 100. The default densenet will not do well.
I used the default cifar100.py, didn't change anything.
What's the result you got for cifar 100?
Ah that CIFAR 100 script is outdated, look at the CIFAR 10 script to see the callbacks that you need to add (ReduceLROnPlateau, LearningRateSchedule etc).
I haven't trained on CIFAR 100 I believe. Wonder why. Maybe I didn't have time due to another project or something. If you are able to successfully train the model, could you submit it here so I could add it to the releases?
Ah I remember why I didn't train it. My GPU cannot load the DenseNet-BC-100-12 model, so I didn't try to train it.
add those CIFAR10 callbacks to cifar100.py is the only change needed?
Or can you complete the cifar100.py code? then I will train it.
Compare with the cifar10 and 100 scripts and see where necessary changes are needed. Im a little busy with finals.
+callbacks=[lr_reducer,model_checkpoint] from cifar10, got 0.7261.
======================================= 2165s - loss: 0.5228 - acc: 0.9895 - val_loss: 1.7049 - val_acc: 0.7253 Epoch 191/200 2164s - loss: 0.5208 - acc: 0.9900 - val_loss: 1.7034 - val_acc: 0.7256 Epoch 192/200 2164s - loss: 0.5216 - acc: 0.9899 - val_loss: 1.7067 - val_acc: 0.7257 Epoch 193/200 2164s - loss: 0.5223 - acc: 0.9890 - val_loss: 1.7052 - val_acc: 0.7252 Epoch 194/200 2163s - loss: 0.5214 - acc: 0.9896 - val_loss: 1.7045 - val_acc: 0.7262 Epoch 195/200 2164s - loss: 0.5239 - acc: 0.9886 - val_loss: 1.7045 - val_acc: 0.7262 Epoch 196/200 2164s - loss: 0.5214 - acc: 0.9896 - val_loss: 1.7051 - val_acc: 0.7256 Epoch 197/200 2165s - loss: 0.5232 - acc: 0.9889 - val_loss: 1.7047 - val_acc: 0.7266 Epoch 198/200 2165s - loss: 0.5215 - acc: 0.9894 - val_loss: 1.7037 - val_acc: 0.7255 Epoch 199/200 2164s - loss: 0.5220 - acc: 0.9896 - val_loss: 1.7046 - val_acc: 0.7251 Epoch 200/200 2164s - loss: 0.5216 - acc: 0.9899 - val_loss: 1.7045 - val_acc: 0.7261 Accuracy : 72.61 Error : 27.39
======================================= with early stop:
Epoch 91/200 2164s - loss: 0.6211 - acc: 0.9761 - val_loss: 1.7136 - val_acc: 0.7199 Epoch 92/200 2163s - loss: 0.6190 - acc: 0.9760 - val_loss: 1.7148 - val_acc: 0.7211 Epoch 93/200 2164s - loss: 0.6190 - acc: 0.9771 - val_loss: 1.7131 - val_acc: 0.7199 Epoch 94/200 2164s - loss: 0.6193 - acc: 0.9774 - val_loss: 1.7146 - val_acc: 0.7206 Epoch 95/200 2164s - loss: 0.6170 - acc: 0.9777 - val_loss: 1.7155 - val_acc: 0.7203 Epoch 96/200 2163s - loss: 0.6184 - acc: 0.9776 - val_loss: 1.7152 - val_acc: 0.7204 Epoch 97/200 2164s - loss: 0.6168 - acc: 0.9777 - val_loss: 1.7153 - val_acc: 0.7200 Epoch 98/200 2164s - loss: 0.6169 - acc: 0.9775 - val_loss: 1.7154 - val_acc: 0.7210 Epoch 99/200 2163s - loss: 0.6161 - acc: 0.9777 - val_loss: 1.7138 - val_acc: 0.7192 Epoch 100/200 2164s - loss: 0.6178 - acc: 0.9767 - val_loss: 1.7149 - val_acc: 0.7203 Epoch 101/200 2163s - loss: 0.6160 - acc: 0.9773 - val_loss: 1.7145 - val_acc: 0.7215 Accuracy : 72.15 Error : 27.85
Cifar 10 weights are already given, they get in the high 90s. Cifar 100 gets lower, since state of the art is around 80 or so.
with this implementation:
https://github.com/yasunorikudo/chainer-DenseNet
I got ~80%. Since there is ~8% difference for the same algorithm, I wonder if there could be any potention bug in the implementation? or in Keras? You think is this possible?
I don't think the problem is with Keras or the model code. Look at their preprocessing steps.
They use mean-std scaling, I used min-max [0-1]. They used random crops and horizontal flips, I disabled horizontal flips, no random crops. Instead I applied random rotations, and width/height shift.
Edit Other differences :
These differences can cause drastic differences in performance for DenseNets.
Their code is following the original paper preprocessing steps, whereas I am using simpler preprocessing which is directly available in Keras. I think random crops was implemented recently, and I never used horizontal flips cause it takes far longer to learn (albeit gives better results as well).
last 10 Epoch:
Epoch 190/200 2113s - loss: 1.0719 - acc: 0.8601 - val_loss: 2.2472 - val_acc: 0.6207 Epoch 191/200 2113s - loss: 1.0691 - acc: 0.8607 - val_loss: 2.1733 - val_acc: 0.6445 Epoch 192/200 2114s - loss: 1.0706 - acc: 0.8597 - val_loss: 2.1769 - val_acc: 0.6439 Epoch 193/200 2113s - loss: 1.0750 - acc: 0.8585 - val_loss: 2.2456 - val_acc: 0.6286 Epoch 194/200 2113s - loss: 1.0639 - acc: 0.8622 - val_loss: 2.2660 - val_acc: 0.6455 Epoch 195/200 2113s - loss: 1.0679 - acc: 0.8607 - val_loss: 2.1948 - val_acc: 0.6376 Epoch 196/200 2114s - loss: 1.0676 - acc: 0.8609 - val_loss: 2.1855 - val_acc: 0.6522 Epoch 197/200 2113s - loss: 1.0652 - acc: 0.8618 - val_loss: 2.4428 - val_acc: 0.6053 Epoch 198/200 2114s - loss: 1.0675 - acc: 0.8603 - val_loss: 2.2936 - val_acc: 0.6236 Epoch 199/200 2114s - loss: 1.0685 - acc: 0.8589 - val_loss: 2.1497 - val_acc: 0.6450 Epoch 200/200 2113s - loss: 1.0635 - acc: 0.8626 - val_loss: 2.2698 - val_acc: 0.6251 Accuracy : 62.51 Error : 37.49