Closed eisneim closed 1 year ago
Glad to hear it's useful! Unfortunately CLIP doesn't have this capability on its own. It can do image-to-embedding, and text-to-embedding, but not the reverse. IIRC there are (slow) ways to "iteratively search" for some English text that matches a particular CLIP embedding, but that's beyond the scope of this repo.
thanks for your replay! guess i'll have to investigate how image captioning works, thanks anyway.
first of all thank you for sharing such a cool project! and other projects of yours are also amazing! i have test the model it works great but how do i convert inference results into English text?
any examples or explaining would be appreciated!