Open paulcx opened 7 years ago
I'm pretty sure the features you are looking for can be extracted from the roi-pooling layer
@paulcx,
(Assuming you're using VGG16). You can get the features in the following way:
Modify line 176 in test.py to grab the output from the 'conv5_3' layer:
input_list = [net.get_output('cls_score'), net.get_output('cls_prob'), net.get_output('bbox_pred'),net.get_output('rois'), net.get_output('conv5_3')]
cls_score, cls_prob, bbox_pred, rois, features = sess.run(
input_list,
feed_dict=feed_dict,
options=run_options,
run_metadata=run_metadata)
Return the "features" variable to your main application.
Get the ROIs that are above your CONF_THRESH (similar to the demo).
Scale those ROIs back down to the feature level size. For VGG16, it is downsample by 16 in height and width.
Extract the scaled ROIs for all 512 dims:
for roi in rois:
xmin = roi[0]
ymin = roi[1]
xmax = roi[2]
ymax = roi[3]
extracted_feature = features[0, ymin:ymax, xmin:xmax, :]
# Do some stuff here
Check line 160 in test.py. This code is used when RPN is disabled and you specify the ROIs to detect. There is a mention of avoiding anti-aliasing when downsampling the feature space, perhaps follow their code when you downsample your ROIS.
Alternatively, you can just get the vectors from the last fully connected layer (fc7) in the same way as getting the conv5_3 features. You will have a vector of size: num_proposals x 4096 (check this). Extract the ones you want from the inds created when filtering proposals by CONF_THRESH. Then you can teach a softmax layer on these vectors for your final classes. This is good because you can avoid additional bottlenecks by connecting more fully connected layers for your age classifiers.
Thank you @louisquinn. I'm using resnet101 from here . I'm wondering if your second approach could suit my case. General speaking of the solution would be extracting the tensors from last fc layer (or pooling layer) and teach a softmax layer after that for the classifier.
btw, the resnet on frcnn has very good detection performance.
I wonder if I can use the frcnn to extract the features after bbox regression. Then I need to add one more layer like xgboost to output some possibility. For example, I trained some classes like ears,noses,eyes and I want to use these features from such selected bboxes to predict age. It's grateful if anyone can help.