SRGAN resolution result from .jpg LR and .png LR are so different.

JustinhoCHN commented 7 years ago

I've trained the SRGAN model for a week, my training config is :

dataset : DVI2K + my own dataset, about 4000 training pics
epochs : 2000
n_epoch_init : 300
training time : 1 week

I tested some pics, if I use the DVI2K images, it gives pretty good result:

from DVI2K data

but when I test with my images, the result is not so good:

left: LR, right: generated

I start thinking if that's because the image format is jpg?

As we all know, jpg format picture is lossy compression picture, but png format is lossless compression picture. jpg throwing away many information for saving disk space, while png kept all information, so I did the fllowing experiment:

I chose a jpg format image as high resolution image, and then compressed them to jpg and png format low-resolution images, and put these two LR image into trained SRGAN model, let's see what I got:

left: jpg LR, right: generated:

left: png LR, right: generated:

It's obvious that the png format result is better than the jpg format, we can put them in one picture for comparing:

left: jpg, right : png

So the question is, are these STOA super resolution papers (including SRGAN) can only work on png format image super resolution? Are there any paper also doing well on jpg images?

Any ideas will be appreciated, I am working on solving the jpg format images super resolution problem recently.

gyangMedIA commented 7 years ago

Did you try to convert a jpg to a png and then run a test? I am not sure why the LR jpg gives so poor quality.

JustinhoCHN commented 7 years ago

@parisburn of course I did, I used a jpg format image as high resolution image, and then convert to 1)jpg LR image, 2)png LR image, and put them to the same model.

these two LR image are from the same jpg HR image.

thanks your reply.

zsdonghao commented 7 years ago

hi, do you have human images in your training set? because DVI2K doesn't have any human images.

JustinhoCHN commented 7 years ago

@zsdonghao yes, there are a lot of human in my training set. I added my own dataset which consists of a lot of human images to DVI2K dataset, but the quality is different, my HR dataset is 300kb/image, DVI2K HR is about 4mb/image.

zsdonghao commented 7 years ago

I see, I think your training images are too small, I suggest you to use image more than 1000+ pixels.

suke27 commented 6 years ago

hi, do you have any new result for this problem, I also found there are difference between jpg and png

JustinhoCHN commented 6 years ago

@suke27 there's no way to solve this problem for now, because jpg is loss compression, it throws away a lot of image information, while png preserved all information.

kkose commented 6 years ago

As @JustinhoCHN said, the problem is most probably due to JPEG compression. But, I would like to draw attention to another issue. JPEG is a block-based compression scheme, which means, rather than compressing the whole image at once, JPEG independently compresses square patches of the image. Due to this reason (compressing patches independently), as the JPEG compression level increases, the blocking artifacts in the resulting images become more apparent. If the network is trained on PNG images, most probably it does not know how to deal with these blocking artifacts. And even worse, as the borders of the JPEG blocks looks like edges, I suspect that the network will try to enhance them and introduce lots of "ring like" artifact in the image. I suspect this can be the reason for your results @JustinhoCHN and @suke27

splinter21 commented 6 years ago

Will some preprocess be helpful? For example, make the LR(input) of dataset more jpeg artifacts for training，and the model will learn to reduce these blocking artifacts?

JustinhoCHN commented 6 years ago

@splinter22 Thanks for advice, but I've tried, using LR images with artifacts for training, but the network still don't know how to remove the artifacts, even worse, it enhanced them.

suke27 commented 6 years ago

anyone has resolved this issue, I also try many method, not work

ontheway16 commented 6 years ago

@suke27 The only option I can think of is preprocessing the jpegs to clean jpeg artifacts. I remember some software exist for this job. I am pretty sure the block artifacts are primary target to remove for such software.

DTennant commented 6 years ago

There is a ICCV2017 paper Deep Generative Adversarial Compression Artifact Removal(https://arxiv.org/abs/1704.02518) dealing with the JPEG compression artifact removal

JustinhoCHN commented 6 years ago

@DTennant Thanks a lot! I'll check this out and see what we can do.

ZhangDY827 commented 6 years ago

@JustinhoCHN Hello sir, I am new in super resolution and interested in it. In my opinion, the main function of super resolution is to super resolve the size of image, while maintaining the image quality, such as visual perception. However, in the picture you show, the LR and generated image have the same size (height and width) . In your shown image, I wonder if you super resolve the LR image in size through bicubic or other interpolation methods for comparsion. (I know the size of input of the SRGAN is actually smaller than the output by 2x)

JustinhoCHN commented 6 years ago

@CasdDesnDR Yes you are right, in comparison, you have to resize the LR image to the HR size, using bicubic or other methods. Because you can't distinguish the LR and HR image in their original size in human visual perception, and that's the purpose of super-resolution algorithm: if the bicubic method is good, why we still spend our time to find another algorithm?

Heermosi commented 6 years ago

Fine， I'll spend some time reading....

Heermosi commented 6 years ago

@JustinhoCHN Is there any progress made by applying artifact removal? I've tried the same route as you did, I found the bilateral filter might help reduce some noise in the final result, though they did not help identification. My Boss now forgive me for not able to SR and identify objects from jpg LR pics. I thought if that way works, maybe I'll have an alternative to test.

JustinhoCHN commented 6 years ago

@Heermosi There're 2 papers I'd like to recommend: Deep Generative Adversarial Compression Artifact Removal, removing compression artifact using GAN method. And Learning a Single Convolutional Super-Resolution Network for Multiple Degradations, they proposed that the generalization problem is caused by setting the same downsampling method to build the training dataset, we should consider every possible degradations.

Heermosi commented 6 years ago

@JustinhoCHN Fine, I also suspected current test approach is not applicable in real scene. It's too closely coupled with quality. Which means it was just a local working engine.

Heermosi commented 6 years ago

@JustinhoCHN We've tested on raw images captured by Cannon Camera, the pic quality is lower than pics zoomed from HR images. May be the focus is the problem? I guess it was hard to take a pic the same quality as training LR images.

Heermosi commented 6 years ago

@JustinhoCHN We've found the quality of HR picture is different from quality of LR pics used for training. The zoomed LR pics used for training has a sharper edge for HR reconstruction. For example, if you take pics from different distances, the origin images would show the same thickness in edge, it's around 2 pixels. while in zoomed pics, they would be only 1 pixel or less. The optical system seems to have something to do with the sharpness of edges. Even if you thought it might be overcome using optical zooming... there is no effect, smaller object still take 2 pixels for an edge.

It's not only a problem caused by pic formats I think, it has something to do with the optical system.

neilthefrobot commented 3 years ago

The issue is definitely .jpg artifacts and I was able to get around this by taking my HR training set, down sampling 4x, then converting it to .jpg with a low quality (high compression) and using that as my LR set. This way the network is seeing jpg artifacts as inputs and a non jpg artifact version as a target and it learns to convert between them. It actually worked very well too.

tensorlayer / SRGAN

SRGAN resolution result from .jpg LR and .png LR are so different. #18