@tornadomeet Thank you for your Resnet implementation with MxNet. At present, I have completed train imagenet'12 dataset with Resnet 50 and Resnet 101. I can get the similar performance, although, I just use 4 GPU with 225 batch-size and different learning schedule for ResNet 101. The left curves are yours, the right curves are ours.
At present, I hope use this pre-trained model on imagenet'12 to fine-tune other application, such as face data. I reference someone's tips, I know the principle rule.
There are two choice, the one is to fix learning rate of all layers except the last fc layer, and set the learning rate of new fc layer as 0.1. If follow this idea, I can use the built-in api of Mxnet such as "fixed_param_names" when initializing a training module ( mod = mx.mod.Module(net, ......, fixed_param_names=fixed_param_names). In here, I have a question, do I need fixed all parameter layers ( including conv and bn) or just fixed conv layers ?
The other is that all other layers except the last fc layer can be set a smaller value for learning rate, such as 0.001, then we can set the learning rate of new fc layer is a larger one, such as 0.1. There is also the problem as mentioned, only conv layer or all layer is set smaller value. Furthermore, I check the Model API of MxNet, there is a method "set_lr_mult" for optimizer class. Use this method can set individual learning rate multiplier for parameters. However, when I check this source code of "set_lr_mult" method, the program would lookup the attributes in the whole network recursively when the "set_lr_mult()" method is called. But when I print symbol.list_attr() there is nothing. Based on my understanding, if the variables did not have such attributes, nothing would be done. The weights would be updated with the original learning rate. In other words, the set_lr_mult() method take no effects. The issue is coming, when I set attribute with mx.AttrScope in symbol_resnet.py file, even I set attribute for each operator such as mx.sym.Convolution(data=data, ......, attr={'lr_mult': '1'), there is still nothing when I print symbol.list_attr(). So, please tell me how to add attribute with correct way. If I obtain symbol from pretrained model, how can I add attribute for each parameter layers with easy way ?
Above questions are my problems when I code for fine-tune ResNet.
@tornadomeet Thank you for your Resnet implementation with MxNet. At present, I have completed train imagenet'12 dataset with Resnet 50 and Resnet 101. I can get the similar performance, although, I just use 4 GPU with 225 batch-size and different learning schedule for ResNet 101. The left curves are yours, the right curves are ours.
At present, I hope use this pre-trained model on imagenet'12 to fine-tune other application, such as face data. I reference someone's tips, I know the principle rule. There are two choice, the one is to fix learning rate of all layers except the last fc layer, and set the learning rate of new fc layer as 0.1. If follow this idea, I can use the built-in api of Mxnet such as "fixed_param_names" when initializing a training module ( mod = mx.mod.Module(net, ......, fixed_param_names=fixed_param_names). In here, I have a question, do I need fixed all parameter layers ( including conv and bn) or just fixed conv layers ?
The other is that all other layers except the last fc layer can be set a smaller value for learning rate, such as 0.001, then we can set the learning rate of new fc layer is a larger one, such as 0.1. There is also the problem as mentioned, only conv layer or all layer is set smaller value. Furthermore, I check the Model API of MxNet, there is a method "set_lr_mult" for optimizer class. Use this method can set individual learning rate multiplier for parameters. However, when I check this source code of "set_lr_mult" method, the program would lookup the attributes in the whole network recursively when the "set_lr_mult()" method is called. But when I print symbol.list_attr() there is nothing. Based on my understanding, if the variables did not have such attributes, nothing would be done. The weights would be updated with the original learning rate. In other words, the set_lr_mult() method take no effects. The issue is coming, when I set attribute with mx.AttrScope in symbol_resnet.py file, even I set attribute for each operator such as mx.sym.Convolution(data=data, ......, attr={'lr_mult': '1'), there is still nothing when I print symbol.list_attr(). So, please tell me how to add attribute with correct way. If I obtain symbol from pretrained model, how can I add attribute for each parameter layers with easy way ?
Above questions are my problems when I code for fine-tune ResNet.
Looking forward your reply! Thanks!