Closed 1216537742 closed 11 months ago
this is more of a q&a discussion than a bug/issue. CLIP models encode image or text into an aligned embedding space, going from embedding -> image is a different class of models, you'd want to look at diffusion or other generative image models and how one can leverage CLIP embeddings to guide the generation
I want to use the image feature to do some downstream tasks (anomaly detection), could I decode the reconstructed feature to image like an autoencoder? I'm new to this and would really appreciate some simple guidance!