raspstephan / nwp-downscale

MIT License
38 stars 9 forks source link

Improve GAN #46

Open raspstephan opened 3 years ago

raspstephan commented 3 years ago

Here is a new thread for discussing how to proceed with the GAN experiments. For the first results see #29

Current status:

Goal:

Comments/observations:

Next steps:

@annavaughan @HirtM I will focus on this over the next week. Feel free to also run experiments. If you do, could you maybe just quickly announce this in this thread, so that we make sure we are not doing duplicate work. I have been experimenting in notebooks in the Experiments folder. Feel free to copy this setup.

raspstephan commented 3 years ago

First hint: Not using the log transform makes the GAN not work. This suggests that data preprocessing is super important.

raspstephan commented 3 years ago

I am training our current setup with the pure super-resolution setup on my branch stephans_gan as a notebook 07. Will update later on how it went.

raspstephan commented 3 years ago

Really weird: Using the pure SR option with the same setup that produced somewhat reasonable results before does not work. The losses explode and the image looks crap.

image

This emphasizes that I need to check the log-transform/preprocessing.

annavaughan commented 3 years ago

@raspstephan I'm going to look at this now. Is the current best version the one in notebook 07?

raspstephan commented 3 years ago

Hi Anna, here a quick update. 07 contains the close-to-Leinonen setup with the pure super-resolution approach. THIS DOES NOT WORK!? 08 contains the same setup with regular TIGGE to MRMS, which "works". I just tested this again in 09 and 10 with the new weighted sampling (not yet fully commited because still running). SAME outcome. TIGGE to MRMS "works", pure SR does not. I have no idea why. Maybe we could have a quick meeting tomorrow to look at this together?

annavaughan commented 3 years ago

That would be great to meet tomorrow, I'm free between 12pm-2:30pm Munich time if sometime then works? I'm very confused why the pure SR doesn't work - I'm looking at the code now and it seems fine

raspstephan commented 3 years ago

Great, here a summary of my very confusing findings. BTW @annavaughan, I am working in the Experiments subdirectory of notebooks.

TIGGE-->MRMS works sort of, also with the new sampling method. (09 notebook). The loss implodes. image image

Pure SR (so MRMS coare-->MRMS fine) with the same setup does NOT work (10 notebook) image image

Testing a different discriminator architecture with two heads more similar to Leinonen produces a really weird discriminatory loss of almost exactly 10(!?). This is cause by the gradient penalty. No idea why. image image

We need to figure out why the pure SR behaves so differently from the TIGGE--> MRMs setup. This makes no sense to me at the moment...

raspstephan commented 3 years ago

@annavaughan Update on my experiments with lower learning rate. Started out encouraging but then wasn't so great after all.

First up, I ran the TIGGE-MRMS setup that has previously "worked" with 1e-4, now with 1e-5. Notebook 09. In the end the results look pretty similar to the original learning rate. Maybe a little less filamenty but still not realistic enough. image image

Then I ran the pure super-resolution setup with 1e-5 (notebook 10). Unfortunately, I didn't save the training logs. But after like 8 epochs the losses again exploded and the images became increasingly unrealistic. image

This makes me wonder whether the LR was really at fault after all. Maybe it just delayed the inevitable... So no solution yet. How did you get on with your Leinonen setup? Happy to talk again tomorrow to debug.

raspstephan commented 3 years ago

@annavaughan So my MNIST test was a failure. I guess this tells us at least that it's not the data (which we already suspected) But then WHAT THE HELL IS IT!?!?!? It must be something in the networks or training that we both are doing wrong?? Well that is, if you MNIST experiments also turn out to fail. Keep me posted :)

image image