Open ppwwyyxx opened 6 years ago
@ppwwyyxx , I guess the difference is probably from the different version of cudnn. I ran the experiments of Res-18/Res-34 on the machine with cudnn-5.0, while I ran the experiment of Res-50/Res-101 on another machine (I don't remember the version of cudnn and I currently have no access to that machine). It seems that Res-18 and Res-34 have similar results with the experiments on https://github.com/facebook/fb.resnet.torch.
Perhaps.
Another question: in this repo there are resnet_BN
vs resnet_DBN_scale_L1
and preresnet_BN
vs preresnet_DBN_scale_L1
. Which pair was used as the experiments in the paper? I didn't see the paper mention the use of preresnet but the README here mentions it.
@ppwwyyxx It's resnet_BN vs resnent_DBN_scale_L1 as described in the paper. Thanks for your pointing out, I will revise the README.
Actually, we also ran preresnet-18 and preresnet-34 (The same configuration with res-18 and res-34 described in this repo). The respective results are: 30.44 vs 29.97 (preresnet-18 vs preresnet-DBN-scale-L1-18), 26.76 vs 26.44 (preresnet-34 vs preresnet-DBN-scale-L1-34). We also further ran extra 10 epochs with learning rate diving 10 (100 epochs in total), which is the common setups in recent papers), then we get the results: 29.79 vs 29.31 (preresnet-18 vs preresnet-DBN-scale-18), 26.01 vs 25.76 (preresnet-34 vs preresnet-DBN-scale-L1-34).
Thanks. The diff shows that resnet_DBN_scale_L1 has one more convolution layer, which means the comparison between it and resnet_BN is not fair:
$ diff resnet_BN.lua resnet_DBN_scale_L1.lua
13a14
> require 'cudbn'
121a123,126
> model:add(Convolution(64,64,3,3,1,1,1,1))
> model:add(nn.Spatial_DBN_opt(64,opt.m_perGroup, opt.eps,_,true))
> model:add(ReLU(true))
>
There is no big difference, I guess. You can check the model of preresnet.lua and preresnet-DBN-scale.lua (They have the same number convolution). I also did experiments with the original preresnet (without this extra conv, on 18 and 34 layer), the original preresnet has slightly better performance (30.38 vs 30.44 for 18 layers, 26.66 vs 26.76 for 34 layers). So I guess there is no big deal for the original residual network with this extra convolution. If you are interested in it, you can validate it. I also can run the experiments, however, I only have one machine with 8 GPUs available (and shared with other Lab members), it may take long times to get the results.
thanks for this
This repo uses fb.resnet.torch for ImageNet experiments. However the performance of ResNet-50 in fb.resnet.torch is 24.01: https://github.com/facebook/fb.resnet.torch. The performance of ResNet-50 baseline reported in the DBN paper is 24.87. Why is that?