Closed mabaorui closed 1 year ago
The ground truth bbox dataset is available here: https://drive.google.com/drive/folders/1vY9Pv6aBekL3cKT0hHQgl8_pS-kadoCY?usp=sharing
You should be able to replicate the evaluation code with the following:
keyframes.json
i_rgb.json
Hi Chung Min, Thanks for the nice work. In the evaluation, how do you handle multiple bbox annotations with the same phrase? E.g., "wavy noodles" in ramen/2_rgb is annotated twice, for the noodles in the front and also in the background. Since your metric relies on the highest relevancy pixel there will always only be one detection per phrase. Is it sufficient to detect only one, or will there always be one missing?
Since your metric relies on the highest relevancy pixel there will always only be one detection per phrase.
This is correct. As you mentioned, for the "wavy noodles", the relevancy should highlight both the noodles on the front as well as the back, but the highest relevancy point may be in either highlighted region.
In our experiments we check if the highest relevancy pixel is located on the correct semantic location, and consider success if the highest point lies inside either boxes.
Hi! Thank you for your interesting open-source work! Could you please provide the corresponding evaluation code and the ground truth box for evaluation in Section 4.3 Localization( "To evaluate how well LERF can localize text prompts in a scene we render novel views and label bounding boxes for 72 objects across 5 scenes." ) ? Thank you again for your work!