Closed dqj5182 closed 4 months ago
it's done by 1) skip connection with small weight to the input from the network and 2) adding a offset to the logsigma in the latent space s.t. the variance is small in initilzation.
when resuming from pre-trained, it wont init as GT. In fact, after init from scratch and the training starts, the latent points will be getting away from GT points
From the previous issue, it was stated that
" I don't think you need to train it for longer:
best_val.pth
? (it should be saved in the pth file), it might be very early epoch; Since our latent points is initialized as gt points, and the vae is initialized as identity mapping, you will see such figure at the beginning.in general the longer you train, the worse reconstruction you will get (as shown in the val EMD/CD curve), but smoother latent space (i.e. the latent points closer to N(0,1), this will make training the diffusion model easier). And we need to find a good trade off between them, In the figure you show the latent points is super smooth, I feel like the model can be stopped earlier. "
May I ask which part of the code initializes latent points as GT points? (also wish to know whether the code initializes latent points as GT points even when we resume training code from pre-trained checkpoints)
Looking forward for your reply and always thanks for your kind feedback! @ZENGXH