tornadomeet / ResNet

Reproduce ResNet-v2(Identity Mappings in Deep Residual Networks) with MXNet
Apache License 2.0
556 stars 199 forks source link

How can I use this Resnet to fine-tune on other application ? such as face data #31

Open bruinxiong opened 7 years ago

bruinxiong commented 7 years ago

@tornadomeet Thank you for your Resnet implementation with MxNet. At present, I have completed train imagenet'12 dataset with Resnet 50 and Resnet 101. I can get the similar performance, although, I just use 4 GPU with 225 batch-size and different learning schedule for ResNet 101. The left curves are yours, the right curves are ours. image

At present, I hope use this pre-trained model on imagenet'12 to fine-tune other application, such as face data. I reference someone's tips, I know the principle rule. There are two choice, the one is to fix learning rate of all layers except the last fc layer, and set the learning rate of new fc layer as 0.1. If follow this idea, I can use the built-in api of Mxnet such as "fixed_param_names" when initializing a training module ( mod = mx.mod.Module(net, ......, fixed_param_names=fixed_param_names). In here, I have a question, do I need fixed all parameter layers ( including conv and bn) or just fixed conv layers ?

The other is that all other layers except the last fc layer can be set a smaller value for learning rate, such as 0.001, then we can set the learning rate of new fc layer is a larger one, such as 0.1. There is also the problem as mentioned, only conv layer or all layer is set smaller value. Furthermore, I check the Model API of MxNet, there is a method "set_lr_mult" for optimizer class. Use this method can set individual learning rate multiplier for parameters. However, when I check this source code of "set_lr_mult" method, the program would lookup the attributes in the whole network recursively when the "set_lr_mult()" method is called. But when I print symbol.list_attr() there is nothing. Based on my understanding, if the variables did not have such attributes, nothing would be done. The weights would be updated with the original learning rate. In other words, the set_lr_mult() method take no effects. The issue is coming, when I set attribute with mx.AttrScope in symbol_resnet.py file, even I set attribute for each operator such as mx.sym.Convolution(data=data, ......, attr={'lr_mult': '1'), there is still nothing when I print symbol.list_attr(). So, please tell me how to add attribute with correct way. If I obtain symbol from pretrained model, how can I add attribute for each parameter layers with easy way ?

Above questions are my problems when I code for fine-tune ResNet.

Looking forward your reply! Thanks!

tornadomeet commented 7 years ago

hello, @bruinxiong pls ref to https://github.com/dmlc/mxnet-notebooks/tree/master/python/how_to

ZhengHe-MD commented 7 years ago

@tornadomeet , this doesn't answer @bruinxiong 's question at all.