lucidrains / x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers
MIT License
666 stars 47 forks source link

Extract Text and Image Latents #13

Closed mmsamiei closed 1 year ago

mmsamiei commented 1 year ago

Hi, in the current implementation we can only extract text and image embedding (by set return_encodings=True) which are obtained before applying latent linear layers. Isn't it better to add an option to extract latent embeddings? Another importance of this is that with the current code, it is impossible to extract the similarity matrix between a batch of images and a batch of text.

lucidrains commented 1 year ago

@mmsamiei Hi Mohammad! Try https://github.com/lucidrains/x-clip/commit/5eb5fcc10d574f7641e51444aa03f4e6ec8c42db#diff-3858fdc3d4b7a5fce034a5fe9f25bf300fe1431999316145c018015644e86f91R505

mmsamiei commented 1 year ago

Thanks a lot!