Closed lbertge closed 2 months ago
Hi! This seems like a useful feature, and I'm curious how things change as well (I don't think from my experience the hypotheses get monotonically closer to the true embedding in terms of cosine similarity, by the way.)
So if you want a quick-and-dirty mechanism for finding the hypothesis embeddings, you can replicate a function or method like generate()
that runs multiple rounds of correction in a loop: https://github.com/jxmorris12/vec2text/blob/master/vec2text/trainers/corrector.py#L233-L306
I would definitely accept a pull request to optionally collect and return the hypothesis embeddings at each step though. I don't think it would take too much code to configure.
Hmm, I attempted to edit the generate()
function to collect embeddings, but I think I don't know what to do during beam search. Say for example that I set beam_sequence_width=4
. At each step, I accumulate 4 new hypothesis embeddings, so I end up with a list that's like [torch.Size([4,1536], torch.Size([4, 1536], ... torch.Size([1, 1536])].
I don't know if it's a good idea here to rely on a heuristic to 'pick' one of the embeddings during the generation steps, do you have any thoughts? Maybe I should enable this option only when beam_sequence_width=1
?
I think it depends on what you're using the hypotheses for. You could just flatten the output, or only add the hypothesis embedding from the beam that's closest to ground-truth.
Hello,
I would like to look at the hypothesis embeddings, for example, to see how the cosine similarity changes per iteration. It looks to me like
invert_embeddings()
only returns the final string. Is there an easy way to do this/would you accept a PR that returns the intermediary embeddings? Thank you!