Why do 3d VAE suddenly appear Mosaic in training

guanzhenghua commented 1 year ago

Thank you so much for your paper and code. I tried VAE model in training in 2d images and the results were very good. However, I encountered some difficulties in training 3D mri. I don't know why. When training VAE in 3D mri, mosaics are generated suddenly. Here is the thing. I input VAE model with mri of 128128128 size, and take pictures for every 200 samples trained. The first sample was very blurry but barely recognizable as a brain image. the loss is about 2.0e+7, The picture is as follows. At the 27th sample, can get a clearer picture of the brain, the loss is about 2.0e+6, The picture is as follows. But on the next sample, the 28th sample, get a bunch of mosaics, and all the subsequent samples are mosaics, The losses are higher than they were in the first place. The picture is as follows. This problem has been bothering me for a long time. Thank you very much for your help！

mueller-franzes commented 1 year ago

Unfortunately, I do not know for sure what is causing this error. It seems that there is an instability in the optimization of the loss function. Are you training with 32bit or 16? What learn rate are you using (optimizer_kwargs={'lr':1e-4})? Another cause could be the correct weighting between embedding loss (KL) and reconstruction (adjustable via embedding_loss_weight).

But first I would try to uncomment the SSIM loss (see self.ssim_loss(pred, target) ). It could be that it is responsible for the instability.

guanzhenghua commented 1 year ago

Thank you very much for your thoughtful reply. I've solved the problem with the Mosaic. Maybe it was the 3d mri, so I used the smaller lr : 5e-6 and it didn't happen. But there's a new problem. 3d VAE doesn't work well. This is the result of my 50 iterations of 3000 mri each. First is the original. The edges are very clear. The second is a reconstruction, but with blurred edges.

So the question is I don't know how to adjust the parameters, for exmaple lr ，embedding_loss_weight ，number of VAE layers(maybe dimensions are reduced by a factor of eight is too much for 3d ,maybe four can work better).or VAE do not have the ability to reconstruct 3d mri , I need to choose another model. So I sincerely hope to get your advice. Thank you very much for your reply

mueller-franzes commented 1 year ago

Glad to hear you were able to solve the first problem :)

The easiest way to improve the quality is to reduce the compression. Two things are crucial here 1) emb_channels=8 (increase for better quality) 2) strides = [ 1, 2, 2, 2] (reduce for better quality) - if your image is not isotropic e.g. 32x512x512 I would try the following e.g. [ 1, (1, 2, 2), (1, 2, 2), 2].

Might also help: embedding_loss_weight=0

Finally, if you have enough memory, it helps to increase the channels (the effect is rather small): hid_chs = [ 64, 128, 256, 512]

guanzhenghua commented 1 year ago

I'm sorry that I didn't see your reply until two days later. Thank you very much for answering each question in such detail. First of all, I'm very sorry. I forgot to state that the size of the image is 128128128. In the second question, which was two days ago, I asked how to reconstruct high-quality images with 3d mri. Here are the parameters I selected at that time. Very coincidentally, many of the parameters are the same as you suggested.

1.emb_channels=8
2.strides = [ 1, 2, 2, 1]
3.embedding_loss_weight= 1e-5 4.hid_chs = [ 64, 128, 256, 512] (my gpu is nvidia RTX A6000 48g )

I would like to ask three more questions here. 1, What is the purpose of embedding_loss_weight, is it necessary to reduce embedding loss when reducing loss?

Whether need to train for long enough, in this paper Medical Diffusion Denoising Diffusion Probabilistic Models for 3D Medical Image Generation, trained for a week, Maybe the quality I'm worried about is not good because I only trained for three days.
Finally, and most important, There are four models in your code，VQVAE, VQGAN, VAE, VAEGAN, which one is better for 3d image,in this paper Medical Diffusion Denoising Diffusion Probabilistic Models for 3D Medical Image Generation, choose vqgan. I found out you were one of the authors, there express my sincerely admiration. i wonder if vagan is the beat model for 3D medical image, and i not reconstruct high-quality images beacuse of used VAE model. Thank you very much for your reply.

mueller-franzes commented 1 year ago

1.)
For VAE embedding_loss_weight corresponds to the weighting between KL loss and reconstruction (eg. L2).

I don't think it is beneficial to reduce embedding loss when the loss decreases but there is some discussion about the optimal determination.
Discussion

I have orientated myself on stable diffusion: Stable-Diffusion implementation Stable-Diffusion config

It even seemed advantageous to me to set embedding_loss_weight=0 for the VAE.

For VQVAE embedding_loss_weight corresponds to the weighting between the codebook/Embedding loss link and reconstruction (eg. L2).

2.) Yes, the training can take a long time, have you looked at the tensorboard metrics (e.g. SSIM)? Are they still decreasing or increasing after three days?

3.) In my experience VAE seemed to work just as well as VQVAE. If you try both, let me know which worked better for you, I'd be interested.

Oh, that's an important point I forgot to mention- sorry! I could see a jump in the quality of the histology images after using VAE-GAN. The GAN is simply another term in the loss function. You can load your pre-trained VAE and continue training with VAEGAN. It's similar to stable diffusion, the GAN loss is "switched on" after x-steps see here

mueller-franzes / medfusion

Why do 3d VAE suddenly appear Mosaic in training #10