Possible to reverse the results?

rinongal / textual_inversion

MIT License

2.87k stars 278 forks source link

If I understand embeddings correctly, they are locations in the latent word space of clip that gives the closest results similiar to a given set of images.

Is it possible then to take an embedding of vector length 1 or more and pass it back through the clip_tokenizer to figure out what words the embedding traslates to? Or if not a real word, use a nearest neighbor approach to find the closest word?

This is not my area, and I got about as far as loading an embedding with pytorch and seeing it is a collection of tensors rather than the list of numbers that the transformer's cliptokenizer takes in its decode method. Looking for any insight you might have @rinongal. Thank you!

rinongal / textual_inversion

Possible to reverse the results? #122