victorca25 / traiNNer

traiNNer: Deep learning framework for image and video super-resolution, restoration and image-to-image translation, for training and testing.
Apache License 2.0
293 stars 39 forks source link

Feature request/bug fix: Perform scaling and other operations in linear light #59

Open awused opened 2 years ago

awused commented 2 years ago

I was looking to try this out to train an upscaling model but thought to try one of my test images first, and found that downscaling was being done in srgb gamma. Most images are encoded as srgb (~188 is half as bright as 255) but downscaling algorithms, where it's especially relevant, assume they're taking linear rgb as input (~127 is half as bright as 255).

I used this image as my input (this isn't a good image for training an upscaler but it does demonstrate the problem) and manually ran it through resize from imresize.py, the same way it is done in generate_mod_LR_bic.py. It's best to open this image in a program that does not perform any scaling, since your browser might be doing some. gamma

What I got out of it at 1/4 scale was a uniform grey square. rlt_srgb

But I can fix this by converting it to and from linear RGB using the methods you already have in colors.py (the functions are named incorrectly, rgb2srgb should be srgb2rgb and vice versa):

    img = cv2.imread('gamma.jpg')
    img = img * 1.0 / 255
    img = torch.from_numpy(np.transpose(img[:, :, [2, 1, 0]], (2, 0, 1))).float()
    img = rgb2srgb(img)

    rlt = resize(img, 1/4)
    rlt = srgb2rgb(rlt)

    torchvision.utils.save_image(
        (rlt * 255).round() / 255, 'rlt.png', nrow=1, padding=0, normalize=False)

This code snippet gives me the expected result: rlt

While this is an artificial example that exaggerates the effect, the colour distortion is going to happen to a varying degree on any images that are transformed in non-linear gamma. I believe this is decreasing the accuracy of the trained models, since they'll be learning to attempt to reverse this colour distortion which can cause noticeable colour shift when upscaling images that were not produced from srgb downscaling.

victorca25 commented 2 years ago

Hello! Thanks for doing the tests and experimenting. I am aware of the linear space operations on images, the correct functions are used in iNNfer: https://github.com/victorca25/iNNfer/blob/09569a1e81cd9a72a1ece85dad73391389998d70/utils/colors.py#L29 https://github.com/victorca25/iNNfer/blob/09569a1e81cd9a72a1ece85dad73391389998d70/utils/colors.py#L49

Where a "linear_resize" is also implemented: https://github.com/victorca25/iNNfer/blob/09569a1e81cd9a72a1ece85dad73391389998d70/utils/utils.py#L267

However, the conversions add considerable latency in the training process (every image in every batch has to be converted back and forth between SRGB and linear) and not all operations on images require them to be applied on the linear space.

Additionally, the logic may be better implemented as a wrapper in https://github.com/victorca25/augmennt, but I haven't had time to evaluate the different implementation options and compare results between the current case and doing the linear conversions, but considering no current SOTA project does SRGB to linear conversions before doing the images operations and results are not impacted, the priority to implement it is relatively low, in comparison to other WIP elements. However, if during your testing you find results improve with the conversions, the priority can change.

awused commented 2 years ago

Looking at linear2srgb that appears to be missing a rounding step before conversion to avoid truncation (np.around(srgb).astype(uint8)).

I did a little experiment, it's not much but in the interest of time I ran 500 iterations on DIV2K (using the pre-trained 4xPSNR as a starting point) with both srgb and linear rgb downscaling (modifying MLResize augmentations.py). With the default sRGB downscaling my first validation was 21-11-22 03:56:04.056 - INFO: <epoch: 4, iter: 500> PSNR: 24.702, SSIM: 0.68838, LPIPS: 0.12116, but with linear RGB downscaling I got 21-11-22 04:36:38.677 - INFO: <epoch: 4, iter: 500> PSNR: 26.169, SSIM: 0.7637, LPIPS: 0.10971. I repeated it again with similar results. I used the basic train_sr.yml with no substantial changes.

Using these two models I took this this original image: original and produced these two output images (which I've downscaled back to the original size to demonstrate the colour differences).

With no modifications to trainner (srgb downscaling): srgb_trained

With augment.py switched to use linear RGB downscaling: linear_rgb_trained

While this was a very small test I do think it demonstrates the impact of colour spaces on downscaling on training. Neither one of them was perfect inside of 500 iterations but the models trained on linear RGB downscaling were much closer in colour. The point isn't that either of these two models is actually any good, but that downscaling colour space can make a difference. This should be repeatable.

Even compared to one of the more well-regarded models (yandere neo xl) the results are roughly on par (I'd argue subjectively better, but marginally worse by PSNR/MAE) in terms of colour accuracy despite only 500 iterations. yandere_neo_xl

no current SOTA project does SRGB to linear conversions before doing the images operations and results are not impacted I honestly believe this is incorrect. The results may not be obviously, visibly, incorrect most of the time but I do believe the results are impacted. The damage done to an image being downscaled in sRGB colour space can be fairly subtle (shifting hues) and the models will be trained to guess how to reverse this process which will have unpredictable effects on output colours.

awused commented 2 years ago

I've had extremely positive results in terms of colour accuracy from training some non-trivial models locally. While I wasn't entirely happy with this one for other reasons and killed it at 183k iterations, this 2x model shows no discernable colour distortion at all.

Here's the same image round-tripped through my model and then downscaled to the original size to show colour accuracy. It's not just this image, either, I can get noticeable colour distortion passing other images through the all the other ESRGAN models I've tried (not all of them from the wiki, but a decent selection) but my model maintains colours just about perfectly. 183k_2x_dscale

I do believe downscaling properly makes a difference and I think at least some of the colour inaccuracies plaguing ESRGAN models can be attributed to downscaling in srgb colour. Without correcting for gamma, downscaling will cause hue shifts which the model will have to learn to reverse. I may repeat the experiment for the same duration without linear rgb downscaling but because it takes so long I'll be leaving that for later. I've already proven the results to my own satisfaction to keep using this edit locally.