liuzhuang13 / DenseNet

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).
BSD 3-Clause "New" or "Revised" License
4.71k stars 1.07k forks source link

About a tensorflow implementation #15

Closed jh-jeong closed 7 years ago

jh-jeong commented 7 years ago

I've followed one of Tensorflow implementations of DenseNet (https://github.com/ikhlestov/vision_networks) to reproduce DenseNet-BC-100-12. It seemed to me that the tensorflow implementation is nearly equivalent with one from this repo, but I couldn't reach to ~4.5 % error (the best one was about ~4.8 %, by the way) Could you give me any reasons why it is? I already compared two codes very carefully, but couldn't find.

Tongcheng commented 7 years ago

@jh-jeong In my "Much more efficient caffe implementation", I also reach about 4.8% for DenseNet-BC-100-12. I am curious of the cause which seems to be common between Caffe and Tensorflow.

jh-jeong commented 7 years ago

@Tongcheng Finally I could get 4.5% in Tensorflow. What I changed are as follows:

  1. Changing the momentum in each BN. In Tensorflow, batch normalization uses 0.999 as the default value, but torch uses 0.9.
  2. Applying weight decay for 'all' trainable variables, as fb.resnet.torch did, including beta/gamma variables in BN and all biases.
anthony123 commented 6 years ago

@jh-jeong can you share your tf-version code?