baseline of Resnet101 - Githubissues

LifeBeyondExpectations commented 6 years ago

I am trying to re-implement the baseline you mentioned as Resnet-101 in tensorflow. However, the performance does not reach as you mentioned .. So I would like to as you some questions.

First, you at first set the learning rate as 1e-3, what I set it as 1e-1 as an origin papers(K. He). Does initial learning rate affect the final performance? Second, is the Resnet-101 which you argued that its F1-O reaches 74.4 modified some layers ??

My results currently does not go bigger than 70.0 (F1-O). Is there any something you would like to recommend to modify?

Thx

zhufengx commented 6 years ago

I am trying to re-implement the baseline you mentioned as Resnet-101 in tensorflow. However, the performance does not reach as you mentioned .. So I would like to as you some questions.

First, you at first set the learning rate as 1e-3, what I set it as 1e-1 as an origin papers(K. He). Does initial learning rate affect the final performance? Second, is the Resnet-101 which you argued that its F1-O reaches 74.4 modified some layers ??

My results currently does not go bigger than 70.0 (F1-O). Is there any something you would like to recommend to modify?

Thx

Hi, thank you for the questions.

(1) During our training, lr=1e-1 and 1e-2 causes gradient explosion, so lr=1e-3 is used in our experiments. Learning rate should affect performance, but may not be that much. (2) The model structure used in our work is the "pre-activation" version of "ResNet-101". You can visualize our model structure through this web tool, and check the difference between ours and your implementations.

I have some other advice that may be useful for solving the performance gap: (1) Pretrained model is very important for the final performance. Our model is pretrained on ImageNet with stochastic depth. Reference pretraining code can be found here. If you would not like to pretrain models by yourself, you can also just convert our released pretrained model to tf format. (2) Data augmentation is also important. We have used random position/aspect ratio crop strategies. Please refer to our *.prototxt for data layer configurations and refer to source file for data layer implementations.

Hope all above may be helpful for you. Thanks.

chaoyan1037 commented 6 years ago

Hi Feng, thanks for you providing your code. I try to use the Netscope to visualize your model but I do not know what the Gists number of your model is. Could you please kindly tell me that? Or could you please share me your proto.txt file? Thanks a lot.

zhufengx commented 6 years ago

Hi Feng, thanks for you providing your code. I try to use the Netscope to visualize your model but I do not know what the Gists number of your model is. Could you please kindly tell me that? Or could you please share me your proto.txt file? Thanks a lot.

Hi, @chaoyan1037, you can find download links to the prototxt files in "README.md".

chaoyan1037 commented 6 years ago

@zhufengx Thanks. I found it.

cyilu commented 4 years ago

Hi, I wonder how many epochs do you use? Any help will be appreciated. Thank you.

zhufengx / SRN_multilabel

baseline of Resnet101 #14