the achitecture for the two models is the same for the discriminator, but the change is in the generatores as for the EEGAN i used a dense block for the feature extraction and for the EESRGAN i replaced the dense blocks with RRDB (residual in residual dense block) for both feature extraction and Edge enhancement, which raised better results and helped more with the vanishing gradiants.
navigate to the directory where the requirements.txt exists and run the command pip install -r requirements.txt
to install the packages.
Note You're also gonna need an NVIDIA GPU + CUDA for the training, altho the inference can be done on the cpu but it's gonna be slow.
metrics
PSNR -
peak signal-to-noise ratio
SSIM -
Structural Similarity Index
Note As the images is multichannel, so these measures is calculated across all channels and then summed and normalized by the number of channels
the dataset is from NARSS (National Authority for Remote Sensing and Space Sciences), and it consists of a single image with resolution 4648 x 4242 x 4 where 4 is the number of channels which are RGB and near infrared, then i cropped into smaller batches of shape 256 x 256 x 4, then applied bicubic interpolation to reduce the resolution of the images by a scale factor of 4 to get the low resolution images, and the shape of the low resolution images is 64 x 64 x 4. The data is then scaled using this script
def scaleCCC(x):
x = np.where(np.isnan(x), 0, x)
return((x - np.nanpercentile(x, 2))/(np.nanpercentile(x, 98) - np.nanpercentile(x,2)))
Note Not using the max value for the scaling but instead 98th percentile ensures clipping the outliers which can ruin the scalling hence the training.
I used a combination of multible losses to produce the final objective which consists of:
Content Loss
applying the loss on the pixel space only would be inconvenient, as the images is not only pixels but also the features that is constituted by these pixels,so we need to bring the image into some feature space in order to get effective loss, which done by the Content Loss by propagating the image through the VGGnet to get the features of the images then applying the charbonnier loss P on these features of the image and the ground truth to get the loss where P(x) = (x 2 + ε2)^(1/2)
Important As our dataset consists of 4 banded images, and the VGGnet accepts only RGB images, we had to apply PCA on the images to make channel reduction from 4 to 3 channels in order to be able to propagate it through the VGGnet without lossing much information.
consistency Loss
the consistency loss is computed by applying the charbonnier loss across the pixels.
Note: Not that the pixel wise loss is useless, it's important to keep the consistency between our target image and the ground truth.
adversarial loss
which pushs the discriminator to identify fake image and real images hence it pushs the generator to produce more realistic images, and can be computed as
Lossadv(θD) = −logD(IHR) − log(1 − D(G(ILR)))
So our final objectiv is produced by the formula L(θG, θD) = LOSScont(θG) + αLOSSadv(θG, θD)+λLOSScst(θG)
where α and λ are the weight parameters to balance the loss
components. I empirically set the weight parameters α and λ to 1 × 10−3 and 5.
python train.py
low resolution image | Super Resolved Image |
---|---|
PSNR | SSIM | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
The model raised good results after only 30 epochs of training, altho it can get better maybe by getting more data and applying some data augmentations.
Find the paper of the EESRGAN here
and the paper for the EEGAN here