princeton-vl / DecorrelatedBN

Code for Decorrelated Batch Normalization
BSD 2-Clause "Simplified" License
82 stars 9 forks source link

ImageNet performance #1

Open ppwwyyxx opened 6 years ago

ppwwyyxx commented 6 years ago

This repo uses fb.resnet.torch for ImageNet experiments. However the performance of ResNet-50 in fb.resnet.torch is 24.01: https://github.com/facebook/fb.resnet.torch. The performance of ResNet-50 baseline reported in the DBN paper is 24.87. Why is that?

huangleiBuaa commented 6 years ago

@ppwwyyxx , I guess the difference is probably from the different version of cudnn. I ran the experiments of Res-18/Res-34 on the machine with cudnn-5.0, while I ran the experiment of Res-50/Res-101 on another machine (I don't remember the version of cudnn and I currently have no access to that machine). It seems that Res-18 and Res-34 have similar results with the experiments on https://github.com/facebook/fb.resnet.torch.

ppwwyyxx commented 6 years ago

Perhaps.

Another question: in this repo there are resnet_BN vs resnet_DBN_scale_L1 and preresnet_BN vs preresnet_DBN_scale_L1. Which pair was used as the experiments in the paper? I didn't see the paper mention the use of preresnet but the README here mentions it.

huangleiBuaa commented 6 years ago

@ppwwyyxx It's resnet_BN vs resnent_DBN_scale_L1 as described in the paper. Thanks for your pointing out, I will revise the README.

Actually, we also ran preresnet-18 and preresnet-34 (The same configuration with res-18 and res-34 described in this repo). The respective results are: 30.44 vs 29.97 (preresnet-18 vs preresnet-DBN-scale-L1-18), 26.76 vs 26.44 (preresnet-34 vs preresnet-DBN-scale-L1-34). We also further ran extra 10 epochs with learning rate diving 10 (100 epochs in total), which is the common setups in recent papers), then we get the results: 29.79 vs 29.31 (preresnet-18 vs preresnet-DBN-scale-18), 26.01 vs 25.76 (preresnet-34 vs preresnet-DBN-scale-L1-34).

ppwwyyxx commented 6 years ago

Thanks. The diff shows that resnet_DBN_scale_L1 has one more convolution layer, which means the comparison between it and resnet_BN is not fair:

$ diff resnet_BN.lua resnet_DBN_scale_L1.lua
13a14
> require 'cudbn'
121a123,126
>       model:add(Convolution(64,64,3,3,1,1,1,1))
>       model:add(nn.Spatial_DBN_opt(64,opt.m_perGroup, opt.eps,_,true))
>       model:add(ReLU(true))
>  
huangleiBuaa commented 6 years ago

There is no big difference, I guess. You can check the model of preresnet.lua and preresnet-DBN-scale.lua (They have the same number convolution). I also did experiments with the original preresnet (without this extra conv, on 18 and 34 layer), the original preresnet has slightly better performance (30.38 vs 30.44 for 18 layers, 26.66 vs 26.76 for 34 layers). So I guess there is no big deal for the original residual network with this extra convolution. If you are interested in it, you can validate it. I also can run the experiments, however, I only have one machine with 8 GPUs available (and shared with other Lab members), it may take long times to get the results.

JaeDukSeo commented 5 years ago

thanks for this