shubhamprshr27 / NeglectedTailsVLM

This repository houses the code for the paper - "The Neglected of VLMs"
23 stars 1 forks source link

Questions about synonyms and caption generating codes #1

Closed e0jun closed 5 months ago

e0jun commented 5 months ago

Hello authors Thank you for your excellent work.

I have questions about synonyms and caption-generating codes.

  1. Following the paper, synonyms are generated using ChatGPT and filtered by the text encoder of OpenCLIP. Are the synonyms in the metric file these processed ones?

  2. When improving the quality of filtered captions by string matching, you filter out irrelevant captions using LLAMA with the tuned descriptions. Can you open-source your code for this LLM-based analysis?

Please give me an answer to this. I appreciate any help you can provide.

shubhamprshr27 commented 5 months ago

Hi @e0jun !

We are still in the process of open-sourcing our code. As a M.S. student who is graduating soon, I have had alot of things going on. But we will be open-sourcing our LLaMA code as well, the expected date is by this weekend.

As for your first query, yes this is after processing.

Thanks for understanding.

e0jun commented 5 months ago

Thank you for your quick reply! I sincerely hope you complete your graduation well. I am going to close this issue.

shubhamprshr27 commented 5 months ago

hi @e0jun I have released the llama analysis code, and can be found in analysis. it is still bare bones, but I will be releasing the mined captions from LAION-400M and LAION-2B on huggingface soon. Thank you for your patience, I hope you find this useful.