minyoungg / vqtorch

MIT License
101 stars 9 forks source link

Why is inplace_optimizer can only be used with beta=1.0 #5

Open sohananisetty opened 1 year ago

sohananisetty commented 1 year ago

Since we are using a separate optimizer for the codebook we would want to prevent codebook updates due to optimizing the original loss. However, If beta = 1.0, then the term left in your formulation of commitment loss is equivalent to the codebook update term in the original VQVAE paper. Shouldn't beta be 0, which would then have the z_q term detached?

minyoungg commented 1 year ago

When beta is 1.0 the codebook is the only thing that gets updated. This is the intended design. You are correct that this is similar to the original VQVAE paper.

The in-place optimizer is a feature for speeding up alternated optimization. Instead of having to do 2 forward passes (1. optimize the codebook and then another forward pass with the updated codebook to train the parameters) the in-place optimizer updates the codebook on the same iteration and uses the new updated parameter for the next layer. When beta is not 1.0, other layers needs to get update, which is mathematically undesirable.

sohananisetty commented 1 year ago

Got it! Thanks. Amazing paper btw.

bingykang commented 12 months ago

When beta is 1.0 the codebook is the only thing that gets updated. This is the intended design. You are correct that this is similar to the original VQVAE paper.

The in-place optimizer is a feature for speeding up alternated optimization. Instead of having to do 2 forward passes (1. optimize the codebook and then another forward pass with the updated codebook to train the parameters) the in-place optimizer updates the codebook on the same iteration and uses the new updated parameter for the next layer. When beta is not 1.0, other layers needs to get update, which is mathematically undesirable.

Hi @minyoungg, thank you very much for the detailed explanation. I am a bit confusing on the alternated optimization scheme.

I understand that when you need to optimize the codebook only, you want to keep everything else frozen. However, I thought that the loss at https://github.com/minyoungg/vqtorch/blob/main/vqtorch/nn/vq.py#L122-L124 is used for updating the codebook with the inplace optimizer, while the loss here https://github.com/minyoungg/vqtorch/blob/main/vqtorch/nn/vq.py#L173 that relies on the beta parameters is actually used to update the whole network after codebook is updated. If you put beta=1 here, then when will the encoder be updated?

SeanNobel commented 8 months ago

@bingykang If you look at the example, the loss you mentioned is ignored actually, and that corresponds to the case when beta=1.0. The encoder is updated with task-related loss.