jxmorris12 / vec2text

utilities for decoding deep representations (like sentence embeddings) back to text
Other
743 stars 85 forks source link

Question on frozen embeddings #20

Closed braceal closed 11 months ago

braceal commented 11 months ago

Hi @jxmorris12, Hopefully a quick question: When training an InversionModel have you found it helpful to use the Embedder neural network directly, or should one always prefer frozen embeddings if possible? It seems like the frozen embeddings would be much faster to use in the overall training process, but is there anything extra gained from using the model directly?

The relevant section of code: https://github.com/jxmorris12/vec2text/blob/master/vec2text/models/inversion.py#L226

if frozen_embeddings is not None:
    embeddings = frozen_embeddings
    assert len(embeddings.shape) == 2  # batch by d
elif self.embedder_no_grad:
    with torch.no_grad():
        embeddings = self.call_embedding_model(
             input_ids=embedder_input_ids,
             attention_mask=embedder_attention_mask,
        )
else:
    embeddings = self.call_embedding_model(
        input_ids=embedder_input_ids,
        attention_mask=embedder_attention_mask,
    )

Any thoughts you may have on this would be really helpful. Thanks!

jxmorris12 commented 11 months ago

There's no difference. If you're training for multiple epochs then you can save time by precomputing the embeddings and using the frozen embeddings parameter. You can also compute them on-the-fly, obviously, and save some disk space, but then you'll be recomputing them every epoch and wasting FLOPs. This might be the only option for large datasets though since embeddings take up a non-negligible amount of disk space.

I was also previously experimenting with learning the full system end-to-end, including the embedder. That's only possible when you don't precompute the embeddings, and set embedder_no_grad parameter to False. Most people probably don't want to do this, though.

braceal commented 11 months ago

That makes sense. Thank you so much!

ArvinZhuang commented 11 months ago

Hi @jxmorris12, I have a follow-up question here. If I set embedder_no_grad=False and use_frozen_embeddings_as_input=False, this means that the embedder will be trained as well right? Will the trained embedder be saved somewhere after training?

jxmorris12 commented 11 months ago

Yep, that's right. I think the trained embedder will be saved as part of the InversionModel (InversionModel.embedder) but I'd have to check to be sure. It should be obvious because the loss curves are way steeper and reach much lower values when the embedder is trainable.