pmelchior / spender

Spectrum encoder and decoder
MIT License
35 stars 11 forks source link

scale invariance for the extra losses #45

Open pmelchior opened 1 year ago

pmelchior commented 1 year ago

The similarity and consistency losses (as written in Liang+2023) assume that the latents have typically amplitudes of order 1. This is not guaranteed by the fidelity training, but if that's not the case it will screw up the extended training procedure by pushing the sigmoids into the flat regime.

This can be fixed by adding rescaling terms that are computed from the typical latent space amplitude:

Screenshot 2023-08-31 at 17 33 26

The first RHS terms should have a prefactor $1/(\sigma_s^2 S)$ instead of $1/S$, in the same way as

Screenshot 2023-08-31 at 17 33 38

This ensures that these terms are all of order 1 and thus remain in the active parts of the sigmoids.

In Liang+23, we set $\sigma_s=0.1$, to set a target size of the consistency loss. It's better to make both of these rescaling terms dynamic, i.e. measure the typical value of $\lVert s\rVert$ across the data set, and update it during the training to account for any shrinking or expansion.

This also has the advantage of