Why is the code of L2 metrics changed from ST-P3?

buaazeus commented 11 months ago

I noticed the L2 metrics is changed from ST-P3, can you please explain why? Thank you.

def compute_L2(self, trajs, gt_trajs):
    '''
    trajs: torch.Tensor (n_future, 2)
    gt_trajs: torch.Tensor (n_future, 2)
    '''
    # return torch.sqrt(((trajs[:, :, :2] - gt_trajs[:, :, :2]) ** 2).sum(dim=-1))
    pred_len = trajs.shape[0]
    ade = float(
        sum(
            torch.sqrt(
                (trajs[i, 0] - gt_trajs[i, 0]) ** 2
                + (trajs[i, 1] - gt_trajs[i, 1]) ** 2
            )
            for i in range(pred_len)
        )
        / pred_len
    )

    return ade

# def update(self, trajs, gt_trajs, segmentation):
#     '''
#     trajs: torch.Tensor (B, n_future, 3)
#     gt_trajs: torch.Tensor (B, n_future, 3)
#     segmentation: torch.Tensor (B, n_future, 200, 200)
#     '''
#     assert trajs.shape == gt_trajs.shape
#     L2 = self.compute_L2(trajs, gt_trajs)
#     obj_coll_sum, obj_box_coll_sum = self.evaluate_coll(trajs[:,:,:2], gt_trajs[:,:,:2], segmentation)

#     if torch.isnan(L2).max().item():
#         debug = 1
#     else:
#         self.obj_col += obj_coll_sum
#         self.obj_box_col += obj_box_coll_sum
#         self.L2 += L2.sum(dim=0)
#         if torch.isnan(self.L2).max().item():
#             debug=1
#         self.total +=len(trajs)

# def compute(self):
#     return {
#         'obj_col': self.obj_col / self.total,
#         'obj_box_col': self.obj_box_col / self.total,
#         'L2' : self.L2 / self.total
#     }

rb93dett commented 11 months ago

We do not use the L2 metric code from ST-P3, it is from our other projects. But the results should be the same. We calculate the average in the 'compute_L2' function, while ST-P3 calculates the average when returning the metric results.

buaazeus commented 11 months ago

I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s. The avg code in ST-P3 is computing for the batch, not time. Can you please double confirm?

Jay-Ye commented 11 months ago

I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s. The avg code in ST-P3 is computing for the batch, not time. Can you please double confirm?

I also noticed this problem. Could you double-check this?

rb93dett commented 10 months ago

I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s. The avg code in ST-P3 is computing for the batch, not time. Can you please double confirm?

In metric.py of ST-P3, the compute function is performing average on the batch, and ST-P3 also computes the average on the time dimension as the result in evaluate.py:

if cfg.PLANNING.ENABLED:
    for i in range(future_second):
        scores = metric_planning_val[i].compute()
        for key, value in scores.items():
            results['plan_'+key+'_{}s'.format(i+1)]=value.mean()

VAD follows this setting for a fair comparison, only that VAD performs the average on the time dimension in the compute_L2 function, and ST-P3 performs the average on the time dimension when calculating the metric results.

buaazeus commented 10 months ago

Got it. Thanks for the reply.

hustvl / VAD

Why is the code of L2 metrics changed from ST-P3? #33