Train my own VAE problem

mueller-franzes / medfusion

Implementation of Medfusion - A latent diffusion model for medical image synthesis.

MIT License

165 stars 32 forks source link

Train my own VAE problem #7

Open Chopper-233 opened 1 year ago

Chopper-233 commented 1 year ago

 Medfusion is an amazing work.Thanks for your work!
 Howerve,when i tried to applied your code to my own datasets to train my own vae,i came across some issues in training.The problem is the loss is too big and it did not decrease after several epochs.I didn't change too many codes in your project except the dataset.So i'm wondering the reason of the issues.

0V8_}PO C9K EDQX6$OSXCB

mueller-franzes commented 1 year ago

Hi, thank you :) I'm sorry for the late reply. The very high loss is normal (the sum is calculated, not the average over the pixels). Can you start Tensorboard (tensorboard --logdir runs) and see if L1 or SSIM improve (you may need to increase smoothing in Tensorboard)? Alternatively, you can also look at the images (runs/.../lightning_logs/version_0/images) - do they get better (sharper) over the epochs?

VasisthaPrakhar commented 1 year ago

I also tried to apply your code to my own datasets to train my own VAE but (runs/.../lightning_logs/version_0/) folder does not contain any Image Folder or .CKTP file it only contains following 2 files even after several epochs(100+), I didn't change too much code in your project except the dataset Can you help me with this?

mueller-franzes commented 1 year ago

That's strange. Did you change anything in save_and_sample_every Link? Try setting it to 1.

Saving the images is related to sample_every_n_steps=1000 link in VAE. Try `VAE(..., sample_every_n_steps=1) then you should see a picture after each step.

xia-jingyu commented 1 year ago

Hi, thank you for your study and code. When I was training VAE with my own data, images appeared in the runs->...->version_0->images folder. Could you tell me what each row and column represents? Thank you very much and look forward to your reply.

AntiLibrary5 commented 1 year ago

Hi, thank you for your study and code. When I was training VAE with my own data, images appeared in the runs->...->version_0->images folder. Could you tell me what each row and column represents? Thank you very much and look forward to your reply.

First row is input sample and second the reconstruction. I could run successfully on my own dataset. The model almost converged on my dataset after 60 epochs as seen from the logs using tensorboard.