raspstephan commented 3 years ago

To start with our ML experiments, the easiest model we can build is probably a simple upsampling (=downscaling) CNN which simply contains a bunch of CNN and upsampling layers and is trained to minimize the RMSE. This won't be particularly good and will result in very washed out images but should beat the interpolation baseline.

We can do this once we have a first version of the data loader #10

raspstephan commented 3 years ago

I started building a first crude upscaling CNN. I used Pytorch Lightning for the training loop, etc. Coming from Keras, I didn't want to write all of the loop myself. Next step is to test it out on a GPU VM.

raspstephan commented 3 years ago

I switched to pure PyTorch and wrote my own training loop inside a Trainer class which is in models.py. The model itself is not particularly good but we should have a basic framework in place now.

raspstephan commented 3 years ago

23

I implemented an upscaling/generator network following the original SRGAN paper but without the batch norms which the ESRGAN paper argues degrades results in GAN training. This might of course not be applicable for plain MSE training, but whatever :p

I added a first experiment 001-Upscale where I simply trained this network with tp --> tp for a few epochs. The results are saved in /dataloader/saved_models/.

I also created a spreadsheet to log the experiments: https://docs.google.com/spreadsheets/d/1uSyeFesYIRLZzxfcNsje6DJpGd2XXWO6o7pAuFGjCvw/edit?usp=sharing

This is only a very first version but here are some things I noticed:

Looking at the output there seems to be a sort of checkerboard pattern. I used the PixelReshuffle upscaling because that's what was used in the SRGAN paper. Maybe linear upscaling might be better?
The network overfits quite heavily. The training loss continues to go down but the valid loss is pretty consistent during training or maybe gets slightly worse.
The current network architecture does not have any randomness. No dropout or input noise. Fine for MSE training but for GANs we have to add that somehow.
When adding a ReLU right at the end, the network predicts slightly positive values everywhere!? (see fig below, you can also see the checkerboard pattern)

@anna-184702 @HirtM Since we now have a first CNN version I will close this issue. We can continue in #24

raspstephan / nwp-downscale

Build a first upsampling CNN (no GAN) #12

23