Closed buaazeus closed 10 months ago
We do not use the L2 metric code from ST-P3, it is from our other projects. But the results should be the same. We calculate the average in the 'compute_L2' function, while ST-P3 calculates the average when returning the metric results.
I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s. The avg code in ST-P3 is computing for the batch, not time. Can you please double confirm?
I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s. The avg code in ST-P3 is computing for the batch, not time. Can you please double confirm?
I also noticed this problem. Could you double-check this?
I found there is a difference. ST-P3 compute L2 on the time 1s 2s 3s, VAD compute avg of (0.5s and1s) as result for 1s, avg of (0.5s, 1s 1.5s and 2s) as result for 2s, avg of (0.5s, 1s 1.5s, 2s, 2.5s and 3s) as result of 3s. The avg code in ST-P3 is computing for the batch, not time. Can you please double confirm?
In metric.py of ST-P3, the compute
function is performing average on the batch, and ST-P3 also computes the average on the time dimension as the result in evaluate.py:
if cfg.PLANNING.ENABLED:
for i in range(future_second):
scores = metric_planning_val[i].compute()
for key, value in scores.items():
results['plan_'+key+'_{}s'.format(i+1)]=value.mean()
VAD follows this setting for a fair comparison, only that VAD performs the average on the time dimension in the compute_L2
function, and ST-P3 performs the average on the time dimension when calculating the metric results.
Got it. Thanks for the reply.
I noticed the L2 metrics is changed from ST-P3, can you please explain why? Thank you.