Open kimdn opened 4 years ago
Wow that is a lot of epochs! Yes, I have run into nans during training before, but only with zdim=1 and usually triggered by impurities/non-standard images in the dataset.
By the way, I typically train for much fewer epochs. Training is typically bottlenecked by model updates, so I usually stick to the default batch size of 8 for more frequent updates and train for 25 epochs or so. Of course this is completely dependent on your dataset characteristics, and I encourage experimentation.
Also, depending on the image size/model size/latent variable dimension, overfitting is possible, especially with more epochs of training. I can recommend some training settings if you want to share some of the details of your dataset or reach out directly.
Wow that is a lot of epochs! Yes, I have run into nans during training before, but only with zdim=1 and usually triggered by impurities/non-standard images in the dataset.
By the way, I typically train for much fewer epochs. Training is typically bottlenecked by model updates, so I usually stick to the default batch size of 8 for more frequent updates and train for 25 epochs or so. Of course this is completely dependent on your dataset characteristics, and I encourage experimentation.
Also, depending on the image size/model size/latent variable dimension, overfitting is possible, especially with more epochs of training. I can recommend some training settings if you want to share some of the details of your dataset or reach out directly.
I see.
It appears that overfitting means that points of the latent space are not representative for input data. Therefore, it likely gives meaningless content once decoded like classic autoencoder. This overfitting may happen when too much emphasis is placed on to minimize reconstruction loss.
(As I understand https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73).
I'm curious how cryodrgn measures/estimates whether the model is overfitted or not. It is overfitted if KLD is larger than certain threshold? (since without kullback-leibler divergence, "vae" will overfit focusing on minimizing reconstruction loss only?)
My data set is large (284,133 particles of experimental cryo-EM images, not synthetically simulated one). After refinement in cryosparc, I extracted these number of particles by relion.
Maybe this is not cryodrgn specific issue, but more general vae problem???.
Anyway, have you ever seen "nan" like below?
Thank you
--