Questions on new metric

Kin-Zhang commented 6 months ago

Q1: Does this one function test if we set flow prediction=0? why other vehicles, PEDESTRIAN, WHEELED_VRU have nan as expected result? code: https://github.com/kylevedder/BucketedSceneFlowEval/blob/3bdfb8e7b534eaebb70d573f9ff768f58f27b821/tests/eval/bucketed_epe.py#L138-L168 copy:

def test_bucketed_eval_av2_no_ground(
    argo_dataset_gt_no_ground: Argoverse2CausalSceneFlow,
    argo_dataset_pseudo_no_ground: Argoverse2CausalSceneFlow,
):
    EXPECTED_RESULTS_DICT = {
        "BACKGROUND": (0.01975785995262935, float("nan")),
        "CAR": (0.008681314962881582, 0.9460171305709397),
        "OTHER_VEHICLES": (float("nan"), float("nan")),
        "PEDESTRIAN": (float("nan"), 0.8834896978129233),
        "WHEELED_VRU": (float("nan"), 0.9758072524985107),
    }
    _run_eval_on_target_and_gt_datasets(
        argo_dataset_gt_no_ground, argo_dataset_pseudo_no_ground, EXPECTED_RESULTS_DICT
    )

Q2: In the figure I attached, I find out Dynamic Normalized EPE is (Average EPE/average speed), but actually in the code, link to specific line I found average speed is actually average flow dis, so the Dynamic Normalized EPE unit is m^2 / m = m right, but not m^2 / (m/s) = m*s, should the paper need a minor modification? or code follow what paper said bucket_max_speed: float = 20.0

Q3: As previous fig, each non-empty bucket, means if CAR only has two errors in speed, for example (0.04, 0.08): 0.1, and (0.1,0.14): 0.2, the dynamic Normalized EPE for CAR is 0.15 but not (0.1+0.2)/51 right? The other question then comes to the mean Dynamic, is calculated in the end if it's in the middle step, WHEELED_VRU doesn't have any value, we won't set 0 but ignore the input?

Let me know if any questions is not clear... Looking forward to your reply. Thanks in advance.

kylevedder commented 6 months ago

Hello,

Thanks for doing a deep dive on our metric

Q1: those are nan because the computation is over a single frame that doesn't have any moving BACKGROUND points, any static PEDESTRIAN and WHEELED_VRU (all points are moving), or any OTHER_VEHICLE points at all (no points from this class appear in the frame)

Q2: I'm not sure if I understand your question, but a first principles unit analysis is:

Flow is meters / second (our models regress meters / 0.1 second because the LiDAR is 10Hz)
Average EPE is meters / second, because it's the average of the L2 norm (not L2 squared) of the vector subtraction between Ground Truth flow minus Estimated flow
Ground Truth speed is meters / second (again, just L2 norm of Ground Truth flow)
Normalized EPE is thus unitless, as its Average EPE / Ground Truth flow, i.e. (m/s) / (m/s) -- it's simply Average EPE as a fraction of ground truth speed, which is why we talk about conceptualizing it as a percentage.

Q3: I don't understand your question because I'm not sure if you're asking about how performance is merged between frames (we keep a running weighed average of EPE and speed as we add up buckets computed across frames) or if you're asking about across speed buckets (we just do a simple average across the Normalized EPE values of the non-empty buckets).

Kin-Zhang commented 6 months ago

Q1 [check again]: I see. First, I thought it was the whole validation set evaluation with no flow estimation. Maybe you selected only one frame without any static in each category. Is it?

Q2 [solved]: The estimated flow we normally say is the motion displacement from Pt to P{t+1}, so in my mind, it's the meter in this case without considering LiDAR frequency. But if you /0.1 in the code for all flow then we can say it's speed. AHa, yes average EPE is the norm I made a mistake in the first comment. Speed or distance, EPE normalized is same as unitless. Thanks.

Q3 [check again]: I see, since I found out in the paper it's said simple average across the EPE values, however, in the code I attached and you linked, it's actually average with weight on point number also, isn't it? and that makes me confused is the point number average on speed bucket and also when we merge between frames (Q3.1).

Q3.1 [New]: I think the result of each frame will be a speed bucket table and a static/dynamic table in each category. The case frame between frames (in other words, the whole validation dataset = 23547 frames, 150 scenes), should be average in on static/dynamic table between frames without any point number consideration, right? Since a different frame/scene is a new one.

Thank you so much for all the detailed explanations.

kylevedder commented 6 months ago

Q1: the test uses a dataloader fixture loading Argoverse 2 tiny, which is a zip containing a single frame pair, so everything is run on just that frame. You can take a look at the dataset by downloading the files from the URL inside the test setup script

Q3(.1): a Frame is defined to be lidar sequence pair with flow between them. We want to end up with a class / speed matrix across all Frames together, but for computational reasons we need to do this online -- we need to be able to keep a single running accounting of this matrix and have the final result at the end of seeing each Frame once.

Importantly, we want every point from across frames to matter equally -- if there were 10 points in a particular speed bucket for pedestrians in Frame A and 10,000 in Frame B, we don't want those 10 points to influence the overall metric as much as those 10,000 points; they should be 10 in a pool of 10,010 points. This prevents outlier frames with a single poorly described point from a particular class from having enormous influence over the evaluation results.

So in addition to tracking the Average EPE for that bucket in Frame A and Frame B, we keep a count of how many points were in that bucket and use that to do a weighted average, so that when we build the matrix of Frame A&B every point is equally contributing to that bucket's EPE. The same goes for the average speed of the bucket which is used for normalization.

Kin-Zhang commented 6 months ago

Thanks a lot!

Two more questions on the EvalAI page:

is there any way I can save other method result files or links? (Make sure that even if the method is removed later, we still have a file to track back results.
is there any way that we can have a code page/project page in the method? (not link on Teams) since I still think one team could have multiple methods to show up on the leaderboard.

kylevedder commented 6 months ago

Unfortunately, EvalAI kind of sucks as a platform -- we use it mostly for historical reasons.

Bullet 1: You should be able to save your submission results from My Submission > Submission File. I have been thinking about keeping a copy of these result files for our supported methods in SceneFlowZoo in the repo itself and using that to autogenerate a table for the README.

As a first cut, here's a ZIP of the performance data for all methods we describe in our ECCV paper; additionally, here's the Bucket Normalized results for FastNSF and its Threeway EPE results -- all are on the AV2 test set.

Bullet 2: I have not found a way to get a single team to have multiple entries on the leaderboard -- the approved method is to make multiple teams and submit one method per team. These teams can be created at https://eval.ai/web/challenge-host-teams and the host ID set as part of the submission CLI.

Kin-Zhang commented 6 months ago

Thank you so much! ❤️

kylevedder / BucketedSceneFlowEval

Questions on new metric #5