kerrj / lerf

Code for LERF: Language Embedded Radiance Fields
https://www.lerf.io/
MIT License
668 stars 65 forks source link

Ground Truth Box #27

Closed mabaorui closed 1 year ago

mabaorui commented 1 year ago

Hi! Thank you for your interesting open-source work! Could you please provide the corresponding evaluation code and the ground truth box for evaluation in Section 4.3 Localization( "To evaluate how well LERF can localize text prompts in a scene we render novel views and label bounding boxes for 72 objects across 5 scenes." ) ? Thank you again for your work!

chungmin99 commented 1 year ago

The ground truth bbox dataset is available here: https://drive.google.com/drive/folders/1vY9Pv6aBekL3cKT0hHQgl8_pS-kadoCY?usp=sharing

You should be able to replicate the evaluation code with the following:

  1. Get test poses and camera intrinsics from keyframes.json
  2. Get phrases + bbox for each scene's i_rgb.json
  3. Query relevancy for each model, each test view, and each phrase, with default settings.
  4. Get the location of the highest relevancy pixel the check it against ground-truth bbox
francisengelmann commented 1 year ago

Hi Chung Min, Thanks for the nice work. In the evaluation, how do you handle multiple bbox annotations with the same phrase? E.g., "wavy noodles" in ramen/2_rgb is annotated twice, for the noodles in the front and also in the background. Since your metric relies on the highest relevancy pixel there will always only be one detection per phrase. Is it sufficient to detect only one, or will there always be one missing?

chungmin99 commented 1 year ago

Since your metric relies on the highest relevancy pixel there will always only be one detection per phrase.

This is correct. As you mentioned, for the "wavy noodles", the relevancy should highlight both the noodles on the front as well as the back, but the highest relevancy point may be in either highlighted region.

In our experiments we check if the highest relevancy pixel is located on the correct semantic location, and consider success if the highest point lies inside either boxes.