kerrj / lerf

Code for LERF: Language Embedded Radiance Fields
https://www.lerf.io/
MIT License
668 stars 65 forks source link

Minimize relevancy score instead of maximize #44

Open CorneliusHsiao opened 1 year ago

CorneliusHsiao commented 1 year ago

Hi, thanks for the excellent work! I have a question regarding your implementation: https://github.com/kerrj/lerf/blob/3b2cb902ea348cb6abf0cc02511ec0f4a0e38c09/lerf/encoders/openclip_encoder.py#L96

And in your paper Sec.3.5 (Relevancy Score), you stated:

Intuitively, this score represents how much closer the rendered embedding is towards the query embedding compared to the canonical embeddings.

My understanding about your inline equation and code is: you try to pick $\phi^i{canon}$ that is closer to $\phi{lang}$ compared to $\phi{lang}$ from $\phi{quer}$, because minimization over $i$ means maximization of similarity between $\phi^i{canon}$ and $\phi{lang}$.

My question is: why is this minimization instead of maximization? I think we are looking for $\phi{lang}$ that best matches $\phi{quer}$ instead of $\phi^i{canon}$, right? Is it because we want the embedding to fit to both $\phi{quer}$ and $\phi^i_{canon}$ at the same time? From my experiment, I do see that results getting worse if I change min to max, but could you explain a little bit more please?

Much thanks!