zsyOAOA / ResShift

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023 Spotlight)
Other
636 stars 35 forks source link

Discrepancies in CLIPIQA and MUSIQ Scores When Testing ResShift on RealSR65 #45

Open Guaishou74851 opened 4 months ago

Guaishou74851 commented 4 months ago

Hi @zsyOAOA,

I am experiencing inconsistencies in the evaluation metrics while testing ResShift with the RealSR65 dataset. Below is a detailed description of my process and the issues encountered:

  1. Data Verification and Command Execution:

    • Confirmed the presence of the dataset in ./testdata/RealSet65.
    • Ran the ResShift inference using the following command:
      CUDA_VISIBLE_DEVICES=0 python inference_resshift.py -i testdata/RealSet65 -o result/RealSet65 --scale 4 --task realsrx4 --chop_size 512
  2. Evaluation Metrics Assessment:

    • Utilized IQA-PyTorch for computing CLIPIQA and MUSIQ metrics.
    • Obtained the following results for the RealSR65 dataset:
      CLIPIQA: 0.6418642669916153 (expected 0.6537)
      MUSIQ: 58.211212921142575 (expected 61.330)
    • Additionally, I observed these results for another subset of RealSR:
      CLIPIQA: 0.5409876523911953 (expected 0.5958)
      MUSIQ: 53.28555391311645 (expected 59.873)
  3. Issue and Inquiry:

    • Despite varying the random seed with the --seed option, the scores did not align with the reported values.
    • This discrepancy persists across different datasets and metrics, prompting me to question if a step was missed or executed incorrectly.

Questions:

I am keen to understand and rectify these discrepancies and would greatly appreciate your insights.

Thank you for your assistance.

zsyOAOA commented 4 months ago

In this repo, I release an enhanced checkpoint trained for a long time. This enhanced version obtains better visual results but the CLIPIQA and MUSIQ metrics decrease slightly. After the ECCV deadline, I will upload the checkpoint to reproduce the results in our paper.

Guaishou74851 commented 4 months ago

Hello @zsyOAOA,

I recently conducted tests using the ResShift model on the ImageNet-Test dataset you provided here. I am pleased to share that the results closely align with the reported values, reinforcing the model's reliability. Below are the specific metrics I obtained:

The results for the ImageNet-Test dataset are satisfactory and align well with the reported figures, which is commendable.


In response to your previous communication, I am eager and optimistic about running the code with the newly uploaded checkpoint to replicate the results documented in your paper. Your efforts in maintaining transparency and reproducibility are much appreciated.

Best regards, Bin Chen