Closed dccho closed 8 years ago
We are experimenting with imagenet, right now we successfully trained a model with only 10m params, the top 1 error is 28.7% which is better than resnet-18 (30.4%)which has 11m params. If you want the model I can share it with you later
thanks
@liuzhuang13 Thanks! Hope to see your great densenet model soon.
Thanks, probably imagenet results needs a while. Do you want the model definition file for imagenet? Or do you want an actual pretrained model? I can share with you through email. Leave your email!
Thanks~! My email is dccho.cvpr.phd@gmail.com. If pretrained model is too big, you can send me definition only. I'll train from scratch
@liuzhuang13 I would appreciate it if you can send me the model definition file for ImageNet Dataset. My email is dzy_wlw@163.com. Thanks!
@wlw208dzy I'll share the links with you here.
densenet (10M parameters, 28.7% val error) definition: https://1drv.ms/u/s!AjwB4qLCejx-be9Qh7ZT-RtvV38 pretrained model: https://1drv.ms/u/s!AjwB4qLCejx-a17znBzqnquzaJY
densenet (40M parameters, 24.0% val error) definition: https://1drv.ms/u/s!AjwB4qLCejx-bJQcJQi9ptGgbT0 pretrained model: https://1drv.ms/u/s!AjwB4qLCejx-bp0a4WlshgcWrNs
Due to limited resources, these are only preliminary models, we're still investigating different architecture design (e.g., bottleneck structures) for DenseNets.
@liuzhuang13 , does densenet(40m parameters) compare to resnet-152 ? from slim, the val error of resnet-152 is about 24.0% And how long does it take to train on imagenet ? Why do you choose Nesterov as optimizer ? Tks!
@argman
@liuzhuang13 , does densenet(40m parameters) compare to resnet-152 ?
From this page https://github.com/facebook/fb.resnet.torch/tree/master/pretrained (Facebook's original implementation), resnet-152 has val error 22.16%, which is better than Densenet with 40M parameters. It has 60M parameters though. Note that data augmentation, optimization, etc. are kept the same. The tensorflow implementation may have some differences.
And how long does it take to train on imagenet ?
It took us 10 days to train 40M densenet for 120 epochs on 4 TITAN X GPUs, with batchsize 128
Why do you choose Nesterov as optimizer ?
We followed fb.resnet.torch's implementation for every setting and hyperparameter, except a smaller batchsize (due to memory constraint) and slightly more training epochs.
Hi @liuzhuang13 I followed your paper's configurations (https://arxiv.org/abs/1608.06993) and trained denseNet-BC-121 (theta=0.5) on ImageNet without data augmentation or dropout. I can only achieve val error 28.15% after 62 epochs (namely 2 epochs after the second lr decrease). And since then the val error is slightly increasing every epoch. I can't replicate the top1 val error 25.02% in your paper. Could you please give me any suggestions?
Hi @yefanhust We trained DenseNet with data augmentation implemented by the fb.resnet.torch repo here https://github.com/facebook/fb.resnet.torch#notes
If you don't use data augmentation, it's unlikely that you will get the same performance.
Thanks much for your prompt reply @liuzhuang13! I'll try the data augmentation then.
Hi @liuzhuang13 I've turned on the data augmentation for training densenet121. I used scale and aspect ratio augmentation (inception-style scale jittering), color jittering (image brightness 0.4, image contrast 0.4, image saturation 0.4), AlexNet style color lighting (std=0.1, with pca eigval and eigvec), color normalizations (means [123.675, 116.28, 103.53], stds [58.395, 57.12, 57.375]) and random mirroring. However, the best top1 error I achieved was 27.59% after 82 epochs, only 0.56% better than without data augmentation, still far from your paper's 25.02%. Do I miss anything here? Or should I train the network longer, say 120 epochs?
@yefanhust What library are you using?
@liuzhuang13 I'm using caffe2.
Maybe the implementation details are different, e.g., batch normalization. BTW, what's your purpose for training it on caffe2?
@liuzhuang13 Are you training from the raw imagenet12 data or other resized version? For answering your question, I work for NVIDIA, and this work is part of our NGC product, to have a base of trained models.
Our setting follows exactly as https://github.com/facebook/fb.resnet.torch, I think the image is first resized and then cropped to be 224x224.
I'm trying to make DenseNet for ImageNet dataset. But, it doesn't converge well. Have you ever try DenseNet to ImageNet dataset? Please share it if you have any successful densenet network for imagenet.