Closed GBJim closed 8 years ago
Hi @GBJim, thanks for working on Python implementation.
For bbox_pred
, one dimension is N
, and the other is 4*num_cls
. cls_num=2
for caltech pedestrian detection. The four columns for background is not used anywhere. This follows Fast-RCNN
.
To get the final object bounding boxes, you need proposal bounding boxes (line 111~119). Meanwhile, the output proposal can be used for proposal performance evaluation.
You are right about cls_pred
.
@zhaoweicai Thank you for the explanations !
Hi @zhaoweicai I am working on a Python implementation of the testing script on the CalTech experiment. I have some questions about the network output of feedforward.
The output is a hash(dictionary) contains three items:
bbox_pred
,proposals_score
andcls_pred
The array dimentions areN*8
,N*6*1*1
andN*2
respectively.I don't quite understand the meaning of these outputs.
bbox_pred: I think this is the bbox coordinates, but why the dimension is
N*8
? A bbox only needX1, Y1, width and height
totally 4 columns for representation. What's the meaning of the other values?proposals_score: I don't understand the meaning of this one. To my knowledge, Object Detection task only need the bbox and classification confidence.
cls_pred: The confidence of the prediction, I think one column is for background and another for a pedestrian.
Please give me some hint. Thank you! :)