mahaichuan / Versatile-Image-Compression

Lossy compression; Variable rate compresson; Lossless compression;
56 stars 6 forks source link

Implementation Details #2

Closed uberkk closed 2 years ago

uberkk commented 3 years ago

Great effort overall. I am curious about your algorithm so after reading your paper I tried to investigate the implementation to understand better, and I need to say that I am new to implementation of wavelets using CNNS(probably most of the community). There are some points that I could not get:

1) The structure in the paper shows h = xo - P1(xe) and l1 = xe + U1(h1) but in the implementation, the equations become H = H + skip + L_net * 0.1 etc. I think this is because you should use 2D wavelet but I could not understand how these works end up in same results and why the output of the filter multiplied by "0.1". Is there a reference to select this value?

2) In Pixel_CNN_2d.py, there exists parameters such as h,b,a which are parts of the input image(label), why those are splitted in a way that h = tanh(x[,,,0:33]) etc, could you briefly mention the idea? In addition in the same script upper = label + 0.5 / static_Qp and lower = label - 0.5 / static_QP, why () are not used such as upper = (label + 0.5) / static_Qp and how to choose static_Qp correctly, there are more than one QP coefficient in codeblocks such as 4096 and 8192, why those values are selected?

3) In main_testRGB, in comments bior 4.4 is mentioned, I think it is biorthogonal wavelet transform, but I could not understand what is the relation of Bior4.4 to the algorithm(CNN based)?

4) There exists wavelet coefficient which is written at the beginning of the codeblock, I could not find the information why they are used in the implementation? Is there a part that is mentioned in the paper?

5) Is there any other scripts to train the network or modelling the CDF of the training set images? Since it is written main_testRGB, I suppose training script does not exist in repo, could you please inform?

Thank you for your answers.

mahaichuan commented 3 years ago

Thanks for your good questions! I will answer one by one.

  1. "H = H + skip + L_net*0.1". If "H = H + skip", then it's the traditional bior4.4 wavelet. This means that we use bior4.4 as a shortcut connection for iWave, just like ResNet using skip connection. This is a trick for training iWave.

We treat "skip+0.1L_net" as a union, i.e. P/U = skip+0.1L_net.

The well-known super-resolution network EDSR also multiply the output from ResBlock by 0.1. This could be seen as an reference. But we did not conduct ablation experiments for this value. Our original intention was to make the best use of bio4.4 to assist the training of iWave.

  1. "h, b, a" are different channels output from the context model (a PixelCNN). For example, h are the first 33 channels.

Static_QP is a trainable parameter, and is determined after the training process. It directly control the rate and distortion.

"label" has been divided by Static_QP. See line 78 in Pixel_CNN_2d.py.

"4096, 8192" are some extreme values obtained by analyzing traditional wavelet coefficients. Using them is to normalize wavelet coefficients as much as possible.

  1. In fact this should be deleted. We just want to indicate that iWave using bior4.4 as a shortcut connection.

  2. They are used as the shortcut conncetion for iWave, as metioned in A1.

  3. The training code will be coming soon, as well as a Pytorch implementation of the project.

Hope that I have answered all your questions.

mahaichuan commented 3 years ago

By the way, in the computer vision commuity, the wavelet-like transform is all called Normalizing Flow.

uberkk commented 3 years ago

Thank you for your detailed answers and fast response. In order to be clear let me ask a little further

1) I understand that "0.1" is a coefficient that is used to decrease the effect main brunch in Fig 3, when L_net >> skip, otherwise the given information that is fed back to the network might get lost, become less important. 2) When will you release Pytorch implementation, do you have a prediction about the date of release? Because I have the idea to use Pytorch to compare and use your algorithm, it is a good new that you will release Pytorch implementation. Otherwise, I will try to understand all the details and then try to implement using Pytorch. 3) In Figure 3, first filter is 3x3x1 but I think in the implementation(code) you multiply the even/odd parts of the image with 3x1 filter where the initial weights are defined as coefficient of bior4.4. I could not understand why you choose to use this method. In other words, I think in implementation bior 4.4 coefficients are used to initialize 3x1x1 filter while in paper it starts with 3x3x1. In addition, is there a reason that you start with bior 4.4 coefficients? 4) Does rounding cause lose of gradient information(in equation 8-9)? Probably, it does not because otherwise I think the network will not be end-to-end. Thus, how do you preserve the gradients?

I know I ask a lot of detail but I am trying to understand your work overall and appreciate your effort.

uberkk commented 3 years ago

Could you please respond the previous comment?