jxmorris12 / vec2text

utilities for decoding deep representations (like sentence embeddings) back to text
Other
673 stars 75 forks source link

Return intermediate hypotheses and hypothesis embeddings during generation #44

Closed lbertge closed 4 months ago

lbertge commented 4 months ago

42

This exposes a new API method which will return both the intermediate hypotheses and their corresponding embeddings during multi-step generation. If there are multiple beams, I'm only returning the best one.

Open to suggestions/reviews, there is a fair bit of duplicate code but I didn't think it was worth doing a refactor

demo

import vec2text 
import copy
import torch
from vec2text.utils import get_embeddings_openai_vanilla

def compute_cosine_similarity(embeddings1, embeddings2):
    return torch.nn.functional.cosine_similarity(embeddings1, embeddings2, dim=1)

corrector = vec2text.load_pretrained_corrector("text-embedding-ada-002")

text = "Hello my name is Albert"

embed_text = get_embeddings_openai_vanilla(text, model="text-embedding-ada-002")
embed_text = torch.Tensor(embed_text).cuda()

output_strings, hypothesis_embeddings = vec2text.invert_embeddings_and_return_hypotheses(embed_text, corrector, num_steps=10, sequence_beam_width=4)

print("Original text: " + text)

for i, hypothesis_embedding in enumerate(hypothesis_embeddings):
    print(f"Hypothesis string at step {str(i)}: " + output_strings[i][0])
    similarity = compute_cosine_similarity(embed_text, hypothesis_embedding)
    print(f"Cosine similarity to original: {similarity.item()}")
Original text: Hello my name is Albert
Hypothesis string at step 0: Hello my name is Albert Alberto I am a Belgian born scientist and my name is Albert
Cosine similarity to original: 0.9451707005500793
Hypothesis string at step 1: Hello my name is Albert. I am Albert Hi Albert
Cosine similarity to original: 0.9766112565994263
Hypothesis string at step 2: Hello my name is Albert
Cosine similarity to original: 0.9999992251396179
Hypothesis string at step 3: Hello my name is Albert
Cosine similarity to original: 1.000000238418579
Hypothesis string at step 4: Hello my name is Albert
Cosine similarity to original: 0.9999982118606567
Hypothesis string at step 5: Hello my name is Albert
Cosine similarity to original: 1.000000238418579
Hypothesis string at step 6: Hello my name is Albert
Cosine similarity to original: 1.000000238418579
Hypothesis string at step 7: Hello my name is Albert
Cosine similarity to original: 1.000000238418579
Hypothesis string at step 8: Hello my name is Albert
Cosine similarity to original: 0.9999986290931702
Hypothesis string at step 9: Hello my name is Albert
Cosine similarity to original: 1.000000238418579
Hypothesis string at step 10: Hello my name is Albert
Cosine similarity to original: 1.000000238418579