zhaoweicai / mscnn

Caffe implementation of our multi-scale object detection framework
405 stars 211 forks source link

The output of feedforward #12

Closed GBJim closed 8 years ago

GBJim commented 8 years ago

Hi @zhaoweicai I am working on a Python implementation of the testing script on the CalTech experiment. I have some questions about the network output of feedforward.

The output is a hash(dictionary) contains three items: bbox_pred,proposals_score and cls_pred The array dimentions are N*8, N*6*1*1 and N*2 respectively.

I don't quite understand the meaning of these outputs.

bbox_pred: I think this is the bbox coordinates, but why the dimension is N*8 ? A bbox only need X1, Y1, width and height totally 4 columns for representation. What's the meaning of the other values?

proposals_score: I don't understand the meaning of this one. To my knowledge, Object Detection task only need the bbox and classification confidence.

cls_pred: The confidence of the prediction, I think one column is for background and another for a pedestrian.

Please give me some hint. Thank you! :)

zhaoweicai commented 8 years ago

Hi @GBJim, thanks for working on Python implementation.

For bbox_pred, one dimension is N, and the other is 4*num_cls. cls_num=2 for caltech pedestrian detection. The four columns for background is not used anywhere. This follows Fast-RCNN.

To get the final object bounding boxes, you need proposal bounding boxes (line 111~119). Meanwhile, the output proposal can be used for proposal performance evaluation.

You are right about cls_pred.

GBJim commented 8 years ago

@zhaoweicai Thank you for the explanations !