wei-tim / YOWO

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
840 stars 161 forks source link

question about def test in train.py #73

Closed tomatowithpotato closed 3 years ago

tomatowithpotato commented 3 years ago

I found that in the test function:

1.the proposal is selected based on whether the target detection confidence is greater than 0.25

2.the correct is selected based on iou and classification

I can’t understand why this is done, and I think it’s unreasonable, because the correct must come from the proposal precision may be greater than 1 in this case

Can someone explain????

MKowal2 commented 3 years ago

I am also getting a similar problem. Any updates? This seems like a hard-coded threshold and doesn't make much sense to me.

MKowal2 commented 3 years ago

@okankop Any thoughts on this?

okankop commented 3 years ago

@MKowal2 @tomatowithpotato let me clarify how the code is implemented. When you look at the test_ava and test_ucf24_jhmdb21 functions in optimization.py, you will see the following steps:

  1. First inference is performed to get predictions.
  2. Then NMS is applied in order to get rid of overlapping detections. Note that output of NMS is positive detections with different confidence values.
  3. Please recall the definition of precision=true positives / (true positives + false positives). In other words, denominator is output of NMS.
  4. It is true that I have hard coded a threshold value (0.25) in order to omit detections, which our network made less confidently. However if you use any threshold value between 0.1 and 0.4, you would get very similar results.
  5. At test time, precision sometimes becomes larger than 1.0, but this does not arise from this hard coded threshold value. The reason is that for some classes such as ice skating and fencing, there are overlapping two people in the scene who performs the action. The network preduces one detection box which is considered correct for both people. So correct is two, but the proposal is one.
  6. Finally, this precision metric is only used at the code and is not reported in the article. It is just to provide some more justification of the performance of YOWO architecture at test time.

I hope this explanation clarify your thoughts.

MKowal2 commented 3 years ago

Thank you for the reply @okankop. Everything you said makes sense, however I still believe there is a mistake in the code.

Specifically, in line 179-181 of test_ucf24_jhmdb21:

`

           for i in range(len(boxes)):
                if boxes[i][4] > 0.25:
                    proposals = proposals+1

`

This uses only the boxes with a confidence greater than 0.25 as the value of [proposals]. However in the next few lines (187-196), the number of correct predictions (variable: [correct]) is determined by the [best_iou], which is obtained by iterating over ALL of the boxes, regardless of their confidence:

`

                for i in range(num_gts):
                box_gt = [truths[i][1], truths[i][2], truths[i][3], truths[i][4], 1.0, 1.0, truths[i][0]]
                best_iou = 0
                best_j = -1
                for j in range(len(boxes)): # THIS SHOULD ONLY CONSIDER BOXES WITH CONF > 0.25 !!!
                    iou = bbox_iou(box_gt, boxes[j], x1y1x2y2=False)
                    if iou > best_iou:
                        best_j = j
                        best_iou = iou

                if best_iou > iou_thresh:
                    total_detected += 1
                    if int(boxes[best_j][6]) == box_gt[6]:
                        correct_classification += 1

                if best_iou > iou_thresh and int(boxes[best_j][6]) == box_gt[6]:
                    correct = correct+1

`

Therefore the precision, which is calculated by [precision = 1.0*correct/(proposals+eps)], can't be greater than 1, since [correct] is obtained by considering all boxes (no matter what the confidence is) and [proposal] is calculated by only considering the boxes with confidence over 0.25.

I have implemented the same function but only considering the boxes with confidence > 0.25 during the calculation of [correct]:

`

           # MY IMPLEMENTATION
            # ADD BOXES WITH OVER 0.25 CONF TO pred_list TO BE USED DURING 
            pred_list = [] # LIST OF CONFIDENT BOX INDICES

            for i in range(len(boxes)):
                if boxes[i][4] > 0.25:
                    proposals = proposals+1
                    pred_list.append(i)

            for i in range(num_gts):
                box_gt = [truths[i][1], truths[i][2], truths[i][3], truths[i][4], 1.0, 1.0, truths[i][0]]
                best_iou = 0
                best_j = -1
                #for j in range(len(boxes)): # CHANGED HERE
                for j in pred_list: # ITERATE THROUGH ONLY CONFIDENT BOXES
                    iou = bbox_iou(box_gt, boxes[j], x1y1x2y2=False)
                    if iou > best_iou:
                        best_j = j
                        best_iou = iou

`

The key is iterating over only the boxes which have passed the threshold of 0.25. Please let me know if this makes sense to you or if I am missing something. After running the script, I do not get values of precision above 1 anymore.

Thank you!

okankop commented 3 years ago

@MKowal2 I agree with you. I now fixed the bug at the implementation. I think the issue is resolved now!