lucidrains / imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
MIT License
7.88k stars 748 forks source link

Trying to use video training on scientific data #247

Open pgarz opened 1 year ago

pgarz commented 1 year ago

Hello, I'm trying to model some complex scientific data that is cut up in 20 timesteps on a 64x64 grid on only 1 channel. The data seems to vary in range from -5.5 to 5.5, but I can't guarantee it's exactly that range, just from observation. From reading the code base, it seems that if I set auto_normalize_img=False when initializing the ElucidatedImagen object, the model will expect data to be within -1 to 1. So as a hack, I just divided my entire dataset by 5.5. Still, however, my model doesn't seem to train very well, even when attempting to overfit 10 data examples. I've been trying to overfit a simple model only using the initial Unet and the base hyperparameters. The parameters I tried to change that seem to give me better results are here variations = { "sigma_min": [0.000002], "sigma_max": [1], "sigma_data": [0.13], "lr": [1e-4], "num_sample_steps": [200,],}

I've also tried messing with the other hyperparameters in a random search but no luck.

Any other guesses as to why my model doesn't seem to train well? Loss seems to flat out around 0.2, but ideally, it should be quite a few orders of magnitude smaller. The average difference between a start video-conditioned prediction and a set of reference ground truth frames seems to be around 0.2 as well. Again this would ideally be close to zero as possible.

alif-munim commented 1 year ago

Hi! I know this is an old issue, but I'm working with a similar dataset and running into very similar issues. Did you have any luck with your training?