Hi, I was wondering if CONCH is able to directly convert an image to text? From the code, it seems like CONCH is only available for "image-to-text retrieval," meaning that given an image and several texts, it will check which text is most similar to the given image. However, in the paper, there is also an example of CONCH doing captioning and a comparison between predicted and corrected captions. If so, could you please provide the code for doing captioning? Thanks!
Hi, I was wondering if CONCH is able to directly convert an image to text? From the code, it seems like CONCH is only available for "image-to-text retrieval," meaning that given an image and several texts, it will check which text is most similar to the given image. However, in the paper, there is also an example of CONCH doing captioning and a comparison between predicted and corrected captions. If so, could you please provide the code for doing captioning? Thanks!