titu1994 / DenseNet

DenseNet implementation in Keras
MIT License
707 stars 294 forks source link

cifar100 Accuracy : 62.51? #12

Closed mw66 closed 7 years ago

mw66 commented 7 years ago

last 10 Epoch:

Epoch 190/200 2113s - loss: 1.0719 - acc: 0.8601 - val_loss: 2.2472 - val_acc: 0.6207 Epoch 191/200 2113s - loss: 1.0691 - acc: 0.8607 - val_loss: 2.1733 - val_acc: 0.6445 Epoch 192/200 2114s - loss: 1.0706 - acc: 0.8597 - val_loss: 2.1769 - val_acc: 0.6439 Epoch 193/200 2113s - loss: 1.0750 - acc: 0.8585 - val_loss: 2.2456 - val_acc: 0.6286 Epoch 194/200 2113s - loss: 1.0639 - acc: 0.8622 - val_loss: 2.2660 - val_acc: 0.6455 Epoch 195/200 2113s - loss: 1.0679 - acc: 0.8607 - val_loss: 2.1948 - val_acc: 0.6376 Epoch 196/200 2114s - loss: 1.0676 - acc: 0.8609 - val_loss: 2.1855 - val_acc: 0.6522 Epoch 197/200 2113s - loss: 1.0652 - acc: 0.8618 - val_loss: 2.4428 - val_acc: 0.6053 Epoch 198/200 2114s - loss: 1.0675 - acc: 0.8603 - val_loss: 2.2936 - val_acc: 0.6236 Epoch 199/200 2114s - loss: 1.0685 - acc: 0.8589 - val_loss: 2.1497 - val_acc: 0.6450 Epoch 200/200 2113s - loss: 1.0635 - acc: 0.8626 - val_loss: 2.2698 - val_acc: 0.6251 Accuracy : 62.51 Error : 37.49

titu1994 commented 7 years ago

Have you used Adam optimizer, with learning rate of 1e-3 for 1st 100 epochs, 1e-4 for next 100 and 1-e5 for last 50 epochs?

That is how I trained the DenseNet model for CIFAR 10. Also, you need the far larger DenseNet-28-8 or DenseNet-28-10 for CIFAR 100. The default densenet will not do well.

mw66 commented 7 years ago

I used the default cifar100.py, didn't change anything.

What's the result you got for cifar 100?

titu1994 commented 7 years ago

Ah that CIFAR 100 script is outdated, look at the CIFAR 10 script to see the callbacks that you need to add (ReduceLROnPlateau, LearningRateSchedule etc).

I haven't trained on CIFAR 100 I believe. Wonder why. Maybe I didn't have time due to another project or something. If you are able to successfully train the model, could you submit it here so I could add it to the releases?

titu1994 commented 7 years ago

Ah I remember why I didn't train it. My GPU cannot load the DenseNet-BC-100-12 model, so I didn't try to train it.

mw66 commented 7 years ago

add those CIFAR10 callbacks to cifar100.py is the only change needed?

Or can you complete the cifar100.py code? then I will train it.

titu1994 commented 7 years ago

Compare with the cifar10 and 100 scripts and see where necessary changes are needed. Im a little busy with finals.

mw66 commented 7 years ago

+callbacks=[lr_reducer,model_checkpoint] from cifar10, got 0.7261.

======================================= 2165s - loss: 0.5228 - acc: 0.9895 - val_loss: 1.7049 - val_acc: 0.7253 Epoch 191/200 2164s - loss: 0.5208 - acc: 0.9900 - val_loss: 1.7034 - val_acc: 0.7256 Epoch 192/200 2164s - loss: 0.5216 - acc: 0.9899 - val_loss: 1.7067 - val_acc: 0.7257 Epoch 193/200 2164s - loss: 0.5223 - acc: 0.9890 - val_loss: 1.7052 - val_acc: 0.7252 Epoch 194/200 2163s - loss: 0.5214 - acc: 0.9896 - val_loss: 1.7045 - val_acc: 0.7262 Epoch 195/200 2164s - loss: 0.5239 - acc: 0.9886 - val_loss: 1.7045 - val_acc: 0.7262 Epoch 196/200 2164s - loss: 0.5214 - acc: 0.9896 - val_loss: 1.7051 - val_acc: 0.7256 Epoch 197/200 2165s - loss: 0.5232 - acc: 0.9889 - val_loss: 1.7047 - val_acc: 0.7266 Epoch 198/200 2165s - loss: 0.5215 - acc: 0.9894 - val_loss: 1.7037 - val_acc: 0.7255 Epoch 199/200 2164s - loss: 0.5220 - acc: 0.9896 - val_loss: 1.7046 - val_acc: 0.7251 Epoch 200/200 2164s - loss: 0.5216 - acc: 0.9899 - val_loss: 1.7045 - val_acc: 0.7261 Accuracy : 72.61 Error : 27.39

======================================= with early stop:

Epoch 91/200 2164s - loss: 0.6211 - acc: 0.9761 - val_loss: 1.7136 - val_acc: 0.7199 Epoch 92/200 2163s - loss: 0.6190 - acc: 0.9760 - val_loss: 1.7148 - val_acc: 0.7211 Epoch 93/200 2164s - loss: 0.6190 - acc: 0.9771 - val_loss: 1.7131 - val_acc: 0.7199 Epoch 94/200 2164s - loss: 0.6193 - acc: 0.9774 - val_loss: 1.7146 - val_acc: 0.7206 Epoch 95/200 2164s - loss: 0.6170 - acc: 0.9777 - val_loss: 1.7155 - val_acc: 0.7203 Epoch 96/200 2163s - loss: 0.6184 - acc: 0.9776 - val_loss: 1.7152 - val_acc: 0.7204 Epoch 97/200 2164s - loss: 0.6168 - acc: 0.9777 - val_loss: 1.7153 - val_acc: 0.7200 Epoch 98/200 2164s - loss: 0.6169 - acc: 0.9775 - val_loss: 1.7154 - val_acc: 0.7210 Epoch 99/200 2163s - loss: 0.6161 - acc: 0.9777 - val_loss: 1.7138 - val_acc: 0.7192 Epoch 100/200 2164s - loss: 0.6178 - acc: 0.9767 - val_loss: 1.7149 - val_acc: 0.7203 Epoch 101/200 2163s - loss: 0.6160 - acc: 0.9773 - val_loss: 1.7145 - val_acc: 0.7215 Accuracy : 72.15 Error : 27.85

titu1994 commented 7 years ago

Cifar 10 weights are already given, they get in the high 90s. Cifar 100 gets lower, since state of the art is around 80 or so.

mw66 commented 7 years ago

with this implementation:

https://github.com/yasunorikudo/chainer-DenseNet

I got ~80%. Since there is ~8% difference for the same algorithm, I wonder if there could be any potention bug in the implementation? or in Keras? You think is this possible?

titu1994 commented 7 years ago

I don't think the problem is with Keras or the model code. Look at their preprocessing steps.

They use mean-std scaling, I used min-max [0-1]. They used random crops and horizontal flips, I disabled horizontal flips, no random crops. Instead I applied random rotations, and width/height shift.

Edit Other differences :

These differences can cause drastic differences in performance for DenseNets.

Their code is following the original paper preprocessing steps, whereas I am using simpler preprocessing which is directly available in Keras. I think random crops was implemented recently, and I never used horizontal flips cause it takes far longer to learn (albeit gives better results as well).