pals-ttic / adapting-CLIP

MIT License
64 stars 10 forks source link

This work is extremely similar to the published ACL 2022 work ReCLIP. Why is it not mentioned in the paper, and why is it not compared in the experimental part? #5

Closed linhuixiao closed 1 year ago

linhuixiao commented 1 year ago

This work is very similar to the published ACL 2022 work ReCLIP. Why is it not mentioned in the paper, and why is it not compared in the experimental part? ReCLIP:https://aclanthology.org/2022.acl-long.357/

raymondyeh07 commented 1 year ago

Dear Linhui,

Thank you for the interest in our work and sharing this related work of ReCLIP.

Regarding reference & comparison: Our work was released to arXiv on Apr. 7, 2022 and ReCLIP was released to arXiv on Apr 12, 2022. We have not updated the paper since our release. Thank you for bringing ReCLIP to our attention.

Regarding similarity: We are confused by what you mean by our work is “extremely similar” to ReCLIP. In fact, we find the two approaches to be very different. Just by briefly skimming ReCLIP, we have already found several differences:

  1. Our approach does not rely on object proposals.
  2. Our scoring of each “region” is not based on cropping or blurring. We proposed a method based on region-tokens to create a score-map then perform efficient sub-window search (See Sec. 4.2, ,4.3 for details).
  3. ReCLIP presents heuristic rules to handle spatial relation between objects. Our method does not handle spatial relationships explicitly and directly rely on CLIP’s text-encoder.

Best, Raymond

linhuixiao commented 1 year ago

Considering the similarities in the setting, should the adapting-CLIP be compared with ReCLIP instead of ZSGNet?

raymondyeh07 commented 1 year ago

We have not been actively working on this project. Sure, ReCLIP could be an additional baseline.