The results of the comparative experiments in this paper may have some issues.

xiaogang00 / SMG-LLIE

This is the source code for CVPR paper "Low-Light Image Enhancement via Structure Modeling and Guidance"

69 stars 11 forks source link

The results of the comparative experiments in this paper may have some issues. #4

Open kiven111 opened 1 year ago

kiven111 commented 1 year ago

After running the test_LOL_real.sh file, three folders will be generated under the results_image_generation_LOL_real directory: --gt --input --output

Then, by testing the PSNR and SSIM metrics using the generated output and gt, I can obtain consistent results with those mentioned in the paper. The results can be summarized as: ===> Avg.PSNR: 24.6227 dB ===> Avg.SSIM: 0.8219

However, when I replaced the 'gt' generated in your code with the ground truth provided by the official LOLv2 dataset and recalculate PSNR and SSIM metrics, the metrics significantly decreased as: ===> Avg.PSNR: 16.0604 dB ===> Avg.SSIM: 0.4606

Are the metrics evaluated in your paper considered objective and reasonable? Could you explain the inconsistency between the Ground Truth used in the code's testing metrics and the officially provided one? The author needs to provide a reasonable interpretation of the results in the paper.

flyoutLMF commented 1 year ago

https://github.com/xiaogang00/SMG-LLIE/blob/main/datasets/LOL_real.py#L36 https://github.com/xiaogang00/SMG-LLIE/blob/main/configs/transforms_config.py There are several transformers for gt, and don't transform back after inference.

kiven111 commented 1 year ago

Thank you for your response. I added the sort function in the dataloader-related code to address the issue.

Arusa1 commented 1 year ago

Hi, have you figure out what's going on? It seems that the author modified the gt to get the results in the paper, which maybe unfair for other methods.

kiven111 commented 1 year ago

Hi, have you figure out what's going on? It seems that the author modified the gt to get the results in the paper, which maybe unfair for other methods.

The resolution of this model for resulting images is limited to 512x512. In order to test evaluation metrics like PSNR, it is necessary to resize the Ground Truth (GT) to the same resolution. Additionally, there is a missing sorting function in the data loading code, such as 'sort.(),' which leads to a mismatch between the generated images and the images in the original folder. Therefore, it is necessary to generate new GT images in order to calculate evaluation metrics.

Arusa1 commented 1 year ago

Thanks for the explanation and I agree that the image should be resized and the "sort()" function should be included to match corresponding images. However, I notice that in https://github.com/xiaogang00/SMG-LLIE/blob/main/configs/transforms_config.py, the image is processed by transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]). I am wondering that it is normal and reasonable to process the gt images using this code? I think it would normalize the brightness of gt and predicted images to a relatively fixed value, which will considerably improve the final performance since low-light image enhancement partly aims to correct the brightness of low-light images to that of gt ones.

kiven111 commented 1 year ago

Your observation is quite astute, and I didn't notice this during the test. If possible, I look forward to the author's response to this question.

guanguanboy commented 7 months ago

Hi， guys. What suggestion if I report the results of SMG-LLIE in a new paper. Use the ones reported in this paper or reevaluate the results.