Hello, I am trying to find a way to compute the CLIP directional similarity on my original vs edited samples, but I do not have captions for my samples. I thought of doing a similar thing to the paper and using an LLM to generate edit captions from an input caption and an editing instruction. Any advice on that? I was wondering if the GPT3 instance you fine-tuned for the generation of your dataset captions is openly available. Thanks!
Hello, I am trying to find a way to compute the CLIP directional similarity on my original vs edited samples, but I do not have captions for my samples. I thought of doing a similar thing to the paper and using an LLM to generate edit captions from an input caption and an editing instruction. Any advice on that? I was wondering if the GPT3 instance you fine-tuned for the generation of your dataset captions is openly available. Thanks!