Closed sinAshish closed 5 years ago
Hi @sinAshish, all of the backbones are first pre-trained on ImageNet, and then we finetune the entire network with the VOC2012 image-level labels or our affinity labels. You can try from scratch, but the result of this will be really bad in accuracy.
If I may,how have you finetuned the network?
It's simple. For the network computing CAMs, the backbone and the newly added layer (the final fully connected layer for classification) are simultaneously trained, but the latter receives a 10x learning rate of the backbone does. This is a common training scheme of using imagenet pretrained weights. You can check the details in train_cls.py or train_aff.py file in this repository.
Got it. Thanks !
This may sound very foolish to you, but if you don't mind answering. I run the train_cls.py as mentioned in the readme, but it requires pretrained model weights as arguments, now say that I want to generate CAMs on some other dataset, so I just need to remove weights as a mandatory argument and train the model and save the weights right? I don't have a gpu, so it is taking like forever to test any of my hypothesis, so I thought better to ask you!
In my opinion, it depends on your environment. You can give it a try, but I don't recommend to go without pre-trained weights if the dataset does not have more than 100k images and answers.
Was there any specific reason for using pre-trained weights in caffe, you could have used pytorch weights!
Using caffe weights can be hard. But those weights are not mine, and I think it is inappropriate of me to modify it my own purpose. Sorry for the inconvenience.
hi @jiwoon-ahn @hardBird123 , I wanted to know if you trained the vgg16 model from scratch or used a imagenet pre-trained model and used only specific layers of it for training on voc12?