micmic123 / QmapCompression

Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021
81 stars 16 forks source link

Some questions about the paper #1

Open CaiShilv opened 3 years ago

CaiShilv commented 3 years ago

It's amazing to see this work,I'd like to ask some questions. What do the dashed lines and implementations in Figure 6 (b) mean? thank's very much!

micmic123 commented 3 years ago

The dashed lines indicate the accuracies of the top-5 prediction results from the classifier, i.e. whether the ground truth is one of the 5 classes with the largest scores. For each color, the reconstructed images fed to the classifier were same in the dashed lines (measuring top-5 accuracy) and solid lines (measuring top-1 accuracy).

If you have any more questions, feel free to ask me again! Thank you.

CaiShilv commented 3 years ago

Thank you very much. I understand picture 6. Does the PSNR calculated by the article code when running eval.py show nan values? When I run eval.py, I find that Nan values appear in the reconstructed image X_hat,Nan values appear to be generated by x=self.g_s5(x) in g_S decoder.Is there something I didn't notice?

micmic123 commented 3 years ago

In my case, when training the model from scratch with learning rate = 1e-4, the nan outputs appeared occasionally. I tried to solve this instability of training, e.g., by improving numerical stability of our model, but the training was still unstable. It might be due to some problems in the library (compressai). Although it is not a fundamental solution, we recommend to skip update of model parameters when nan values appear in a training step. For example, in train.py: https://github.com/micmic123/QmapCompression/blob/8500f8b8e2d11d599ca6d1d77ce67cb72b217885/train.py#L117-L119 In fact, after decaying the learning rate to 1e-5 (at 1.4M iterations in our experiments), the nan values disappeared soon. Meanwhile, you can test with the released pretrained model.

CaiShilv commented 3 years ago

In compressai, the input probability parameters of entropy encoder encoding and decoding need to be equal. In hs network structure, the function nn.convtransposed2d ( ) is used, which has random sampling. For the same input, two output results have slight differences, which is not reliable for entropy encoder. I don't know if that makes sense to me. Would it be better to replace nn.convtransposed2d ( ) with nn.pixelshuffle ( )?