liuzhuang13 / DenseNet

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).
BSD 3-Clause "New" or "Revised" License
4.71k stars 1.07k forks source link

ImageNet test #5

Closed dccho closed 8 years ago

dccho commented 8 years ago

I'm trying to make DenseNet for ImageNet dataset. But, it doesn't converge well. Have you ever try DenseNet to ImageNet dataset? Please share it if you have any successful densenet network for imagenet.

liuzhuang13 commented 8 years ago

We are experimenting with imagenet, right now we successfully trained a model with only 10m params, the top 1 error is 28.7% which is better than resnet-18 (30.4%)which has 11m params. If you want the model I can share it with you later


dccho commented 8 years ago

@liuzhuang13 Thanks! Hope to see your great densenet model soon.

liuzhuang13 commented 8 years ago

Thanks, probably imagenet results needs a while. Do you want the model definition file for imagenet? Or do you want an actual pretrained model? I can share with you through email. Leave your email!

dccho commented 8 years ago

Thanks~! My email is If pretrained model is too big, you can send me definition only. I'll train from scratch

wlw208dzy commented 7 years ago

@liuzhuang13 I would appreciate it if you can send me the model definition file for ImageNet Dataset. My email is Thanks!

liuzhuang13 commented 7 years ago

@wlw208dzy I'll share the links with you here.

densenet (10M parameters, 28.7% val error) definition:!AjwB4qLCejx-be9Qh7ZT-RtvV38 pretrained model:!AjwB4qLCejx-a17znBzqnquzaJY

densenet (40M parameters, 24.0% val error) definition:!AjwB4qLCejx-bJQcJQi9ptGgbT0 pretrained model:!AjwB4qLCejx-bp0a4WlshgcWrNs

Due to limited resources, these are only preliminary models, we're still investigating different architecture design (e.g., bottleneck structures) for DenseNets.

argman commented 7 years ago

@liuzhuang13 , does densenet(40m parameters) compare to resnet-152 ? from slim, the val error of resnet-152 is about 24.0% And how long does it take to train on imagenet ? Why do you choose Nesterov as optimizer ? Tks!

liuzhuang13 commented 7 years ago


@liuzhuang13 , does densenet(40m parameters) compare to resnet-152 ?

From this page (Facebook's original implementation), resnet-152 has val error 22.16%, which is better than Densenet with 40M parameters. It has 60M parameters though. Note that data augmentation, optimization, etc. are kept the same. The tensorflow implementation may have some differences.

And how long does it take to train on imagenet ?

It took us 10 days to train 40M densenet for 120 epochs on 4 TITAN X GPUs, with batchsize 128

Why do you choose Nesterov as optimizer ?

We followed fb.resnet.torch's implementation for every setting and hyperparameter, except a smaller batchsize (due to memory constraint) and slightly more training epochs.

yefanhust commented 6 years ago

Hi @liuzhuang13 I followed your paper's configurations ( and trained denseNet-BC-121 (theta=0.5) on ImageNet without data augmentation or dropout. I can only achieve val error 28.15% after 62 epochs (namely 2 epochs after the second lr decrease). And since then the val error is slightly increasing every epoch. I can't replicate the top1 val error 25.02% in your paper. Could you please give me any suggestions?

liuzhuang13 commented 6 years ago

Hi @yefanhust We trained DenseNet with data augmentation implemented by the fb.resnet.torch repo here

If you don't use data augmentation, it's unlikely that you will get the same performance.

yefanhust commented 6 years ago

Thanks much for your prompt reply @liuzhuang13! I'll try the data augmentation then.

yefanhust commented 6 years ago

Hi @liuzhuang13 I've turned on the data augmentation for training densenet121. I used scale and aspect ratio augmentation (inception-style scale jittering), color jittering (image brightness 0.4, image contrast 0.4, image saturation 0.4), AlexNet style color lighting (std=0.1, with pca eigval and eigvec), color normalizations (means [123.675, 116.28, 103.53], stds [58.395, 57.12, 57.375]) and random mirroring. However, the best top1 error I achieved was 27.59% after 82 epochs, only 0.56% better than without data augmentation, still far from your paper's 25.02%. Do I miss anything here? Or should I train the network longer, say 120 epochs? densenet121

liuzhuang13 commented 6 years ago

@yefanhust What library are you using?

yefanhust commented 6 years ago

@liuzhuang13 I'm using caffe2.

liuzhuang13 commented 6 years ago

Maybe the implementation details are different, e.g., batch normalization. BTW, what's your purpose for training it on caffe2?

yefanhust commented 6 years ago

@liuzhuang13 Are you training from the raw imagenet12 data or other resized version? For answering your question, I work for NVIDIA, and this work is part of our NGC product, to have a base of trained models.

liuzhuang13 commented 6 years ago

Our setting follows exactly as, I think the image is first resized and then cropped to be 224x224.