Open adam2392 opened 4 weeks ago
Hi, based on the example in the repository, it seems that the latent dimension is set? May I ask how the dimension of latent is set for the image dataset you are doing?
Hi thanks for the reply!
Hi, based on the example in the repository, it seems that the latent dimension is set? May I ask how the dimension of latent is set for the image dataset you are doing?
Yes the latent dimension is set arbitrarily to e.g. 12, 64, or 128. And the image dataset I am using is the standard MNIST 1x28x28. Is this what your question meant? I tried 64 and 128 because those are mentioned in the paper. The latent prior is just a standard Gaussian.
In terms of the hidden dimensions within each ResidualBlock, I just set it according to table 11.
FYI: It could be that my encoder/decoder block are not expressive enough(?), which I am actively exploring, but I did try to replicate the blocks exactly as done in the paper in the Table 11.
Hi, did you try with the MNIST setting that we provide in https://github.com/vislearn/FFF/blob/main/configs/fif/mnist.yaml? IIRC correctly this yields useful generations on MNIST.
Overall, the reconstruction loss looks relatively high at 0.9, but this might be due to the bottleneck. Maybe you can confirm with a non-FIF autoencoder what reconstruction error to expect and otherwise increase the weight for the reconstruction loss (don‘t hesitate to change the order of magnitude).
Thank you very much for your reply, yes I am in the same boat, but I am utilizing a speech dataset and I need to map the speech to a standard Gaussian distribution with defined dimensions, however, at the moment it is difficult. Of course, it could be that my encoder and decoder settings are not good enough. If possible, would it be possible to have a look at your code for training MNIST, would be appreciated.
Train mnist with: python -m lightning_trainable.launcher.fit configs/fif/mnist.yaml --name '{data_set[name]}'. Error reported: shapes do not match for intermediate reconstruction 0: torch.size([512,16]) vs torch..size([512,784]). Troubleshooting didn't find the problem.
Hi,
I was interested in trying to train a FIF on MNIST down to a latent dimensionality of 128, or even 12 to compare the compression. I am using an encoder/decoder, similar to Table 11: https://arxiv.org/pdf/2306.01843. However, I am using a slightly different network, where the hidden dimensionality of the decoder and encoder are the same. If interested, I have the gist.
During training of the model with
beta=100
, I am observing that the NLL consistently decreases, while the reconstruction loss stagnates. For example, here is a snippet of my training log:As a result, my overall training loss decreases, but my reconstruction loss looks like it plateaus around 0.920. It seems this problem is alluded to in the paper, but it suggests that higher betas will work. I am just wondering if there is any intuition on how to fix this problem?
The training seems to just focus on minimizing negative log likelihood, and so the sampled images look fine, but not great. The output of a sampled image: