Understanding evaluation results for L1 difficulty

waymo-research / waymo-open-dataset

Waymo Open Dataset

Other

2.66k stars 609 forks source link

I'm trying to understand how evaluation are done for L1 difficulty. Essentially L1 difficulty selects a subset of GT boxes. In this case, I'm not sure how are precision evaluated. Specifically, I wonder if the predicted boxes are filtered/selected accordingly during evaluation. Intuitively, since some predicted boxes are predictions for L2 level GT boxes, does it make sense to ignore some of them when evaluating L1 difficulty metrics?

I tried searching via google but no one has mentioned this detail. Any links/answers related to this question is appreciated. Thanks!

waymo-research / waymo-open-dataset

Understanding evaluation results for L1 difficulty #446