pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.74k stars 21.33k forks source link

Are the Pre-Trained Models(VGG-16 and AlexNet) for ImageNet Classification correct? #267

Open abbyjako opened 6 years ago

abbyjako commented 6 years ago

Hello. I wonder if the pre-trained vgg-16 and alexnet models are correct. Because when I run ./darknet classifier predict cfg/imagenet1k.data cfg/alexnet.cfg alexnet.weights data/dog.jpg, the results are 0.18%: cassette 0.17%: pinwheel 0.17%: Band Aid 0.17%: beacon 0.16%: sloth bear I can not figure out what is the problem. Would you help to solve this? Thanks very much!

pjreddie commented 6 years ago

that seems..... wrong?

abbyjako commented 6 years ago

@pjreddie I test the model on other images, all get the similar results. I think it is wrong. But I don't know where is the problem. Can you get the correct classification result with vgg-16?

TheMikeyR commented 6 years ago

I have a fresh pulled darknet with alexnet.weights downloaded from https://pjreddie.com/media/files/alexnet.weights and I get some wierd results as well. I've modified batch size in alexnet.cfg from 128 to 1.

$ ./darknet classifier predict cfg/imagenet1k.data cfg/alexnet.cfg alexnet.weights data/dog.jpg

layer     filters    size              input                output
    0 conv     96 11 x11 / 4   227 x 227 x   3   ->    55 x  55 x  96
    1 max          3 x 3 / 2    55 x  55 x  96   ->    27 x  27 x  96
    2 conv    256  5 x 5 / 1    27 x  27 x  96   ->    27 x  27 x 256
    3 max          3 x 3 / 2    27 x  27 x 256   ->    13 x  13 x 256
    4 conv    384  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 384
    5 conv    384  3 x 3 / 1    13 x  13 x 384   ->    13 x  13 x 384
    6 conv    256  3 x 3 / 1    13 x  13 x 384   ->    13 x  13 x 256
    7 max          3 x 3 / 2    13 x  13 x 256   ->     6 x   6 x 256
    8 connected                            9216  ->  4096
    9 dropout       p = 0.50               4096  ->  4096
   10 connected                            4096  ->  4096
   11 dropout       p = 0.50               4096  ->  4096
   12 connected                            4096  ->  1000
   13 softmax                                        1000
   14 cost                                           1000
Loading weights from alexnet.weights...Done!
data/dog.jpg: Predicted in 0.124548 seconds.
 0.10%: red fox
 0.10%: stole
 0.10%: Shetland sheepdog
 0.10%: maraca
 0.10%: envelope

I get similar results as original, from the example at https://pjreddie.com/darknet/imagenet/ the accuracy have changed a bit. $ ./darknet classifier predict cfg/imagenet1k.data cfg/extraction.cfg extraction.weights data/eagle.jpg

layer     filters    size              input                output
    0 conv     64  7 x 7 / 2   224 x 224 x   3   ->   112 x 112 x  64
    1 max          2 x 2 / 2   112 x 112 x  64   ->    56 x  56 x  64
    2 conv    192  3 x 3 / 1    56 x  56 x  64   ->    56 x  56 x 192
    3 max          2 x 2 / 2    56 x  56 x 192   ->    28 x  28 x 192
    4 conv    128  1 x 1 / 1    28 x  28 x 192   ->    28 x  28 x 128
    5 conv    256  3 x 3 / 1    28 x  28 x 128   ->    28 x  28 x 256
    6 conv    256  1 x 1 / 1    28 x  28 x 256   ->    28 x  28 x 256
    7 conv    512  3 x 3 / 1    28 x  28 x 256   ->    28 x  28 x 512
    8 max          2 x 2 / 2    28 x  28 x 512   ->    14 x  14 x 512
    9 conv    256  1 x 1 / 1    14 x  14 x 512   ->    14 x  14 x 256
   10 conv    512  3 x 3 / 1    14 x  14 x 256   ->    14 x  14 x 512
   11 conv    256  1 x 1 / 1    14 x  14 x 512   ->    14 x  14 x 256
   12 conv    512  3 x 3 / 1    14 x  14 x 256   ->    14 x  14 x 512
   13 conv    256  1 x 1 / 1    14 x  14 x 512   ->    14 x  14 x 256
   14 conv    512  3 x 3 / 1    14 x  14 x 256   ->    14 x  14 x 512
   15 conv    256  1 x 1 / 1    14 x  14 x 512   ->    14 x  14 x 256
   16 conv    512  3 x 3 / 1    14 x  14 x 256   ->    14 x  14 x 512
   17 conv    512  1 x 1 / 1    14 x  14 x 512   ->    14 x  14 x 512
   18 conv   1024  3 x 3 / 1    14 x  14 x 512   ->    14 x  14 x1024
   19 max          2 x 2 / 2    14 x  14 x1024   ->     7 x   7 x1024
   20 conv    512  1 x 1 / 1     7 x   7 x1024   ->     7 x   7 x 512
   21 conv   1024  3 x 3 / 1     7 x   7 x 512   ->     7 x   7 x1024
   22 conv    512  1 x 1 / 1     7 x   7 x1024   ->     7 x   7 x 512
   23 conv   1024  3 x 3 / 1     7 x   7 x 512   ->     7 x   7 x1024
   24 conv   1000  1 x 1 / 1     7 x   7 x1024   ->     7 x   7 x1000
   25 avg                        7 x   7 x1000   ->  1000
   26 softmax                                        1000
   27 cost                                           1000
Loading weights from extraction.weights...Done!
data/eagle.jpg: Predicted in 0.018898 seconds.
62.66%: bald eagle
36.00%: kite
 0.46%: vulture
 0.18%: ptarmigan
 0.13%: hen
abbyjako commented 6 years ago

@TheMikeyR Thanks for your reply. I also check extraction model and get the correct classification results. Only vgg-16 and alexnet's results are wierd. Maybe there is something wrong with the connected layer or the pre-train model?

RobinHan24 commented 6 years ago

Hello everyone, so I wonder whether we can train a classifier with pre trained models, if we can, can you tell me how to use it. Thx a lot.

WePCf commented 6 years ago

@abbyjako

The reason of the wrong prediction results on Alexnet and VGG is that although @pjreddie updated the cfg, the trained weight files on pjreddie.com haven't been changed. Please use an earlier commit of darknet, for example https://github.com/pjreddie/darknet/commit/8f1b4e0962857d402f9d017fcbf387ef0eceb7c4. Then you can find that in alexnet.cfg, the activation is "ramp", not the newer one "relu".

I tested the result and it worked. Not sure if @pjreddie has updated the weight on his website or not since I downloaded the old weight file.

tungdanganh commented 6 years ago

Hi all, I am trying to run alexnet model in the darknet. I still receive weird results for all test images (0.1%)

 ./darknet classifier predict cfg/imagenet1k.data cfg/alexnet.cfg alexnet.weights 
layer     filters    size              input                output
    0 conv     96 11 x11 / 4   227 x 227 x   3   ->    55 x  55 x  96
    1 max          3 x 3 / 2    55 x  55 x  96   ->    27 x  27 x  96
    2 conv    256  5 x 5 / 1    27 x  27 x  96   ->    27 x  27 x 256
    3 max          3 x 3 / 2    27 x  27 x 256   ->    13 x  13 x 256
    4 conv    384  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 384
    5 conv    384  3 x 3 / 1    13 x  13 x 384   ->    13 x  13 x 384
    6 conv    256  3 x 3 / 1    13 x  13 x 384   ->    13 x  13 x 256
    7 max          3 x 3 / 2    13 x  13 x 256   ->     6 x   6 x 256
    8 connected                            9216  ->  4096
    9 dropout       p = 0.50               4096  ->  4096
   10 connected                            4096  ->  4096
   11 dropout       p = 0.50               4096  ->  4096
   12 connected                            4096  ->  1000
   13 softmax                                        1000
   14 cost                                           1000
Loading weights from alexnet.weights...Done!
Enter Image Path: data/dog.jpg
data/dog.jpg: Predicted in 0.146725 seconds.
 0.10%: Norwegian elkhound
 0.10%: Great Pyrenees
 0.10%: badger
 0.10%: bobsled
 0.10%: alligator lizard
Enter Image Path: data/eagle.jpg
data/eagle.jpg: Predicted in 0.014585 seconds.
 0.10%: Norwegian elkhound
 0.10%: Great Pyrenees
 0.10%: badger
 0.10%: bobsled
 0.10%: alligator lizard
Enter Image Path: data/person.jpg
data/person.jpg: Predicted in 0.011048 seconds.
 0.10%: Norwegian elkhound
 0.10%: Great Pyrenees
 0.10%: badger
 0.10%: alligator lizard
 0.10%: bobsled

I tried with an older version as suggested by @WePCf, but it seems that it creates the same result. Hi @WePCf, would you share again the commit link that you tested?

Thanks,

padagi20 commented 6 years ago

I have a fresh pulled darknet and pulled alexnet.cfg from 8f1b4e. and the result is following. Loading weights from alexnet.weights...Done! data/eagle.jpg: Predicted in 0.280000 seconds. 6.53%: maze 4.82%: lionfish 2.92%: puck 1.20%: joystick 1.14%: balance beam

The results are no longer consistent at 0.1%. but they are still strange. I really want to use this, please help me, too. :)

saivineethkumar commented 6 years ago

Hi , Where can I find pretrained vgg,resnet model weights trained on coco dataset, currently we have weights available only for imagenet in the website. thanks

arjunmann73 commented 4 years ago

Hi, Commenting because the issue is still open. I predicted using the AlexNet model and pre-trained weights (from the website) and everything seemed to work fine with a correct output. Ran:
./darknet classifier predict cfg/imagenet1k.data cfg/alexnet.cfg alexnet.weights data/dog.jpg Output: data/dog.jpg: Predicted in 1.046875 seconds. 19.03%: golfcart 18.09%: Siberian husky 7.00%: malamute 6.29%: tricycle 4.17%: Eskimo dog

Ben10962 commented 7 months ago

Hey