idearibosome / tf-vmnet

Official TensorFlow implementation of Volatile-nonvolatile Memory Network
Apache License 2.0
1 stars 0 forks source link

Why tf1 instead of tf2? #1

Open devalexqt opened 3 years ago

devalexqt commented 3 years ago

Hi! First of all, very good job and good results with less blur. But why not use tensorflow2 ?

idearibosome commented 3 years ago

Hi devalexqt, We worked on this project when stable tf2 versions were not released yet. Unfortunately, because tf2 does not rely on session-based coding structure, it may be required to refactor the whole parts of the codes for migrating to the tf2 version.

Thanks!

devalexqt commented 3 years ago

Hi! Do you have time to look on my tf2 implementation of vmnet? I have couple of questions?

idearibosome commented 3 years ago

@devalexqt Sure, it would be great if I could help.

devalexqt commented 3 years ago

Let me some time to prepare code.

idearibosome commented 3 years ago

@devalexqt No problem! Please just ping me when your code is ready. Also, if you want to remain your code in private mode (instead of public), please just add me to your repository as a collaborator, so I can check your code and discuss further in your repo. :)

devalexqt commented 3 years ago

Can you please explain we we need to use RGB (127.5, 127.5, 127.5) mean instead of full range 255?

idearibosome commented 3 years ago

Can you please explain we we need to use RGB (127.5, 127.5, 127.5) mean instead of full range 255?

When you want to follow the same training procedure as in our paper, each pixel value needs to be adjusted to have a range of -127.5 ~ 127.5, by subtracting each pixel value (with a range of 0 ~ 255) by 127.5.

https://github.com/idearibosome/tf-vmnet/blob/main/models/vmnet.py#L281

Also, don't forget to add 127.5 of each pixel value of the output, so that the final output image has the range of 0 ~ 255 for each pixel.

This so-called "mean shifting" process is common in various super-resolution models. But if you are writing a code to train your model by yourself, that procedure is actually not mandatory (the performance difference can be marginal).

devalexqt commented 3 years ago

So, basically if we use mean shifting we operate with -1...1 range instead of 0...1 for (0...255), but why? Looks like I get similar results with my tf2 implementation. Soon will be share code.

idearibosome commented 3 years ago

So, basically if we use mean shifting we operate with -1...1 range instead of 0...1 for (0...255), but why?

This is a common approach in many deep learning models. Some researches found that when they rescale the image pixel values to like -1...1, the performance was slightly better than just using 0...1 range. But I currently don't remember the papers that explain this. But this effect may not be so big.

More precisely, some super-resolution models subtract R, G, B channels with the mean pixel values of the entire DIV2K dataset (like EDSR).

https://github.com/limbee/NTIRE2017/issues/19

Looks like I get similar results with my tf2 implementation. Soon will be share code.

Great news!

devalexqt commented 3 years ago

I just send invite to you.

idearibosome commented 3 years ago

@devalexqt Got it. I accepted the invitation and looked into your code briefly (especially for train.py and vmnet.py). The overall structure of the VMNet looks good.

I found that there are two possible differences.

First is about reusing weights of the vmb_block. The VMB should be defined only once and it should be used multiple times.

In tf1, this can be handled by tf.variable_scope with the reuse argument like this: https://github.com/idearibosome/tf-vmnet/blob/main/models/vmnet.py#L308

In your Keras-based implementation, this can be done by creating layers.Conv2D instances first (e.g., in your vmnet() function), and then reuse them by calling the already created layers.Conv2D.

Reference: https://github.com/keras-team/keras/issues/10747

For example, change this code

def vmb_block(x, state):
  x=layers.Concatenate()([x, state])
  x=layers.Conv2D(filters*2, kernel_size, padding="same", activation="relu")(x)
  x=layers.Conv2D(filters*2, kernel_size, padding="same", activation=None)(x)
  ...

def vmnet(...):
  ...
  for i in range(num_vmb_blocks):
    x,state = vmb_block(x,state)
  ...

to something like this:

def vmb_block(x, state, conv_layers):
  x=layers.Concatenate()([x, state])
  x=conv_layers[0](x)
  x=conv_layers[1](x)
  ...

def vmnet(...):
  ...
  conv_layers = []
  conv_layers.append(layers.Conv2D(filters*2, kernel_size, padding="same", activation="relu"))
  conv_layers.append(layers.Conv2D(filters*2, kernel_size, padding="same", activation=None))
  ...

  for i in range(num_vmb_blocks):
    x,state = vmb_block(x, state, conv_layers)
  ...

I didn't check exactly whether the above code is right or wrong, so please check if the above code "truly" creates VMB part only once by calling model.summary() or something others.

Second is about obtaining output images during training. In our implementation, our model obtains output images from all VMBs, resulting in obtaining a total of 16 output images.

https://github.com/idearibosome/tf-vmnet/blob/main/models/vmnet.py#L364

image

Then, these images are combined like this:

X = ((1 * X_1) + (2 * X_2) + (4 * X_3) + (8 * X_4) + ... + (32768 * X_16)) / (1 + 2 + 4 + 8 + ... + 32768)

where X_1 is the output image obtained from the first VMB call, X_2 is the output image obtained from the second VMB call, etc.

This process ensures that the optimizer considers the intermediate VMB outputs of the VMNet during training.

Note that after the training (during testing), the output is just X = X_16.

Feel free to ask any other questions :)

devalexqt commented 3 years ago

Thanks for replying. I will change code and inform you.

devalexqt commented 3 years ago

I changed code and now testing and validating... Can you tell me how much time the original training process take? I try to remove jpeg artifacts from image, it's removed, but image is blurry, so is it possible to decrease blur effect?

idearibosome commented 3 years ago

Can you tell me how much time the original training process take?

It depends on the configuration and GPU types. In our experiment, when we use a patch size of 32 px (= 128 px in high resolution), it took about 3~4 days to run 1,000,000 iterations with a batch size of 8 (= about 10000 epochs) on a NVIDIA GTX 1080 GPU.

I try to remove jpeg artifacts from image, it's removed, but image is blurry, so is it possible to decrease blur effect?

Can you measure PSNR and SSIM values of the output images? Comparing PSNR or SSIM values can be a way to find out whether your model is sufficiently trained or not.

The meaning of "blurry" can be subjective.. But there are several tips to improve performance in the literature as follows:

Reference: https://arxiv.org/abs/1809.00219 image

It is known that a larger patch size can improve the super-resolution performance. But it may also increase the training time, so there may be some trade offs between time complexity and output performance.

Our VMNet model is basically focused on minimizing the model size while preserving PSNR-oriented performance. That's why we modeled our structure to use only one VMB, instead of stacking multiple VMBs.

There are some other approaches to improve perceptual quality by employing so-called Generative Adversarial Networks (GANs). ESRGAN is a good example.

https://github.com/xinntao/ESRGAN

We also developed another super-resolution model that considers both perceptual and PSNR-oriented quality measures.

https://github.com/idearibosome/tf-perceptual-eusr

But please note that the above models are pretty much larger than VMNet.

devalexqt commented 3 years ago

Thanks, I will try.

devalexqt commented 3 years ago

This is before and after images.(with jpeg artifacts) demo_640x360_q32 demo_640x360_q32

devalexqt commented 3 years ago

I found Interesing paper https://arxiv.org/pdf/1511.08861.pdf about using PSNR + SSIM as loss function. I need to deep dive in to it and try to implement.

idearibosome commented 3 years ago

This is before and after images.(with jpeg artifacts)

How many iterations (or epochs) did you run to get this result? Also, I think using training images from your own dataset instead of DIV2K may be better to get better results for your target images.

Actually, I think this is a very interesting topic: increasing resolution along with removing JPEG artifacts. Very recently (about several months ago), there was a challenge for solving image deblurring + super-resolution simultaneously: https://arxiv.org/abs/2104.14854

I hope this also can help you to find another direction :)

devalexqt commented 3 years ago

For quick testing with high jpeg artifacts images I use 10 min execution (10 epoch with 1000 steps per epoch) with "mean shifting". Because, previously without "mean shifting" and with such small run time result is far away and require 3+ days to running to get similar result on Quadro RTX 4000 GPU. Alslo, I thinking how to embed "debluring" for inside network to remove blurry result, so thank for attach paper!

devalexqt commented 3 years ago

I found that "mean shifting" is game changer for such task!

idearibosome commented 3 years ago

For quick testing with high jpeg artifacts images I use 10 min execution (10 epoch with 1000 steps per epoch) with "mean shifting". Because, previously without "mean shifting" and with such small run time result is far away and require 3+ days to running to get similar result on Quadro RTX 4000 GPU. I found that "mean shifting" is game changer for such task!

Oh really? That's very interesting!

devalexqt commented 3 years ago

``Yea! Without "mean shifting" and 10 min run I see next effect that in light part of test image artifacts starting to removing but in dark part of image artifacts not removed. I think is because light part is close to 1 but dark part is close to 0. in image. Please look at top left corner with sky (light part) and hand (dark part).

devalexqt commented 3 years ago

demo_640x360_q32_vmnet jpg

devalexqt commented 3 years ago

Can you share code for paper?

idearibosome commented 3 years ago

``Yea! Without "mean shifting" and 10 min run I see next effect that in light part of test image artifacts starting to removing but in dark part of image artifacts not removed. I think is because light part is close to 1 but dark part is close to 0. in image. Please look at top left corner with sky (light part) and hand (dark part).

Ah, I see. This may be related to the ReLU function inside the layers. But I did not test our model without employing mean shifting, so that much difference is not expected. :)

Can you share code for paper?

https://arxiv.org/abs/2104.14854 this one? This is not our work and it is about the summary of that deblurring + super-resolution challenge. https://competitions.codalab.org/competitions/28073

Maybe this is the code for the 1st ranked solution? - https://github.com/zeyuxiao1997/EDPN

idearibosome commented 3 years ago

Unfortunately, almost all codes for super-resolution are written in PyTorch, not TensorFlow...

devalexqt commented 3 years ago

Initially I tested you original model and got same result as my code with mean sifting.

devalexqt commented 3 years ago

Thank, I will read papers.

devalexqt commented 3 years ago

I try to run with modified loss function: mean absolute error +ssim

    def call(self, y_true, y_pred):
        l1=tf.reduce_mean(tf.losses.mean_absolute_error(y_pred,y_true))
        ssim=tf.clip_by_value(ssim_metric(y_true,y_pred),0,1)
        loss=1-tf.reduce_mean(tf.clip_by_value(tf.image.ssim(y_true, y_pred, 2.0),0,1))+l1
        return loss

And for 10 min training run I got more blurry result, but then I run for more time up to 12h (800epoch*1000steps) I got less and less blurry result. If I run it for more time it maybe outperform just "mean absolute error" loss.

Also, can you help me to find paper about "mean shifting"?

devalexqt commented 3 years ago

demo_640x360_q32

idearibosome commented 3 years ago

Thank you for sharing your results. Because SSIM considers perceptual quality slightly more than PSNR, the visual quality may be better.

Also, can you help me to find paper about "mean shifting"?

I still cannot find the paper about discussing mean shifting, but these links address that part.

Note that mean shifting is basically a very common approach to adjust input data in various deep learning models (beyond super-resolution models).

idearibosome commented 3 years ago

Also, DIV2K contains natural images, so I think training with your own dataset that contains much similar images to your testing images (maybe shooting game screenshots?) may be beneficial, because those images may contain more related textures than the natural textures (captured by digital cameras).