Question about experiments

lxtGH commented 5 years ago

Hi! Thanks for your code and paper. I have several question about this work. (a). In your paper, the results are a little lower than this repo. why? (about 1%) (b). In your paper, you also insert 5 NL blocks in resnet, what are the specific positions of these blocks? (c). Have you inserted 5 NL/GCNL blocks when training on ImageNet? Many thanks !

lxtGH commented 5 years ago

Also, I only got 86.4 results using res50 + cgnl and I tried 4 times training. How to achieve 87 on val dataset ?

kaiyuyue commented 5 years ago

Hi Doctor, thanks for the interests.

Here is the story about what you ask in this item. After we got the final decision from the reviewers and AC, we decided to release the code. But at that time, the code was dirty. So I re-implemented the method again. There were two important ingredients. First, in my new implementation, I changed the scale factor from [0.08, 1.0] to [0.08, 1.25] in this line. Second, I replaced the FC layer for the new tasks. Before that, I didn't replace the FC layer on the CUB dataset, because the classes num is 200 which is smaller than that of ImageNet. So the ImageNet pretrained FC layer with 1000-dim was portable for the CUB. But I found if I replaced the FC and re-initialized it with MSRA init method in this line, it could bring the better performances. In the end, the reason of that we didn't update the results in the paper to sync the ones on the GitHub, was that I had no time to do all the experiments with new settings again. At that time, I had worked on the other project of the company.
The position of inserting 5 CGNL blocks for the ResNet is kept same as doing 5 NL blocks, which is described in the paper X. Wang et al, Non-local Neural Networks, CVPR 2018.

We add 1 block (to res4), 5 blocks (3 to res4 and 2 to res3, to every other residual block), and 10 blocks (to every residual block in res3 and res4) in ResNet-50; in ResNet-101 we add them to the corresponding residual blocks.

No. I have no results on ImageNet using 5 CGNL blocks.
The results on CUB is little tricky to achieve. Some researchers ask me about this before, the related issue is here. The other experiments' results in this repo were both achieved with only one training shot each.

lxtGH commented 5 years ago

Hi! Many thanks. I also have a question about training on ImageNet, Did you use pretrained weight as well on ImageNet like training on cub?

kaiyuyue commented 5 years ago

Yes, suppose I understand right about what you mean. I use the ImageNet pretrained weight to train the model with 1 CGNL block on ImageNet. Moreover, I also use warmup strategy.

lxtGH commented 5 years ago

Hi! Why you use pretrained model rather than train from sracth ?? @KaiyuYue

kaiyuyue commented 5 years ago

Hi~! Using pre-trained model to train it on ImageNet with cgnl/nl modules can promise the better accuracy. So we choose this training way starting from the very first experiments. We did not try it from scratch. It's welcome to share experimental results achieved from scratch if you have.

kaiyuyue / cgnl-network.pytorch

Question about experiments #5