h4nwei / SPAQ

[CVPR2020] Official SPAQ & Implementation
166 stars 35 forks source link

Problem about the score #3

Closed KleinXin closed 4 years ago

KleinXin commented 4 years ago

The score of the image below is 0.38 when testing using model 'BL_release.pt'. I donot think it makes sense. Is there anything wrong with the model?

In fact, I tested 9800 images using this model and scores are all around 0.45 which seems strange.

image

h4nwei commented 4 years ago

Thanks for your interest in our work@KleinXin.

  1. The zip file can be obtained at MEGA. If you want to download the zip file using Baidu Yun, please let me know.
  2. For image quality regression task, we pay more attention to the correlation between predict scores and MOSs. In addition, the proposed BIQA models do not constrain the predicted scores in a specified range, for example [0, 1]. Therefore, the predicted score 0.38 may reflect the worst image quality and 0.6 may reflect the best image quality, and lots of fair images close to 0.48. Hope it helps you out.
KleinXin commented 4 years ago

Thanks for your interest in our work@KleinXin.

  1. The zip file can be obtained at MEGA. If you want to download the zip file using Baidu Yun, please let me know.
  2. For image quality regression task, we pay more attention to the correlation between predict scores and MOSs. In addition, the proposed BIQA models do not constrain the predicted scores in a specified range, for example [0, 1]. Therefore, the predicted score 0.38 may reflect the worst image quality and 0.6 may reflect the best image quality, and lots of fair images close to 0.48. Hope it helps you out.
  1. I could not access MEGA, so it is better if you can provide a link with zip file of Baidu Yun
  2. I also think it should be streched according to the min and max values.

Thank you for your suggestions!

h4nwei commented 4 years ago

Hi @KleinXin KleinXin,

I will upload the zip file to Baidu Yun. It may need many hours. Please keep attention to our GitHub page in the following days.

I agree with you that normalizating the predicted score to a certain range may help to well reflect the image quality, but it may result in a worse correlation coefficient.

Best, Hanwei

KleinXin commented 4 years ago

Hi @KleinXin KleinXin,

I will upload the zip file to Baidu Yun. It may need many hours. Please keep attention to our GitHub page in the following days.

I agree with you that normalizating the predicted score to a certain range may help to well reflect the image quality, but it may result in a worse correlation coefficient.

Best, Hanwei

Thank you very much! I also tested the same 9800 images by using Baidu Image Quality Evaluation API. The mean absolute difference value is 21.28 and std is 13.74. The values are normalized to 0~100. It shows a relatively large difference. Do you think the data annotation causes the difference? Or it may have some other reasons? thx

h4nwei commented 4 years ago

d to 0~100. It shows a relatively large difference. Do you think the data annotation causes the dif

Hi KleinXin, I cannot understand the problem. The mean absolute difference value between the BL model and the Baidu Image Quality Evaluation API is 21.28 and std is 13.74? If so, how to relate the difference to data annotation?

KleinXin commented 4 years ago

d to 0~100. It shows a relatively large difference. Do you think the data annotation causes the dif

Hi KleinXin, I cannot understand the problem. The mean absolute difference value between the BL model and the Baidu Image Quality Evaluation API is 21.28 and std is 13.74? If so, how to relate the difference to data annotation?

Yes, I want to know the score of the same image evaluated by different models. So I use SPAQ and Baidu. It seems these two models have a very large difference.

I donot think one model can surpass another such a large score if they are all state of the art models. So I think the only reason is that the data are different.

h4nwei commented 4 years ago

Hi KleinXin,

The zip file can be downloaded at https://pan.baidu.com/s/1JzwZxwSOpIqcc16cOliBVw (code: 8og5).

I think it is reasonable. For the following reasons,

  1. The Baidu IQA API may hard to capture the realistic camera distortions in SPAQ. You can validate it by PLCC and SRCC rather than computing the absolute difference with BL model.
  2. Our models do not normalize the predicted scores to [0, 100], then the normazlization operation may bring some errors. So does Baidu IQA API.
  3. As you mentioned, the BL model was trained on SPAQ only and Baidu IQA API might have a great difference with it.
KleinXin commented 4 years ago

Hi KleinXin,

The zip file can be downloaded at https://pan.baidu.com/s/1JzwZxwSOpIqcc16cOliBVw (code: 8og5).

I think it is reasonable. For the following reasons,

  1. The Baidu IQA API may hard to capture the realistic camera distortions in SPAQ. You can validate it by PLCC and SRCC rather than computing the absolute difference with BL model.
  2. Our models do not normalize the predicted scores to [0, 100], then the normazlization operation may bring some errors. So does Baidu IQA API.
  3. As you mentioned, the BL model was trained on SPAQ only and Baidu IQA API might have a great difference with it.

Thank you very much! I will read your paper again carefully and try to know how to use it according to our requirement.

KleinXin commented 4 years ago

Hi KleinXin,

The zip file can be downloaded at https://pan.baidu.com/s/1JzwZxwSOpIqcc16cOliBVw (code: 8og5).

I think it is reasonable. For the following reasons,

  1. The Baidu IQA API may hard to capture the realistic camera distortions in SPAQ. You can validate it by PLCC and SRCC rather than computing the absolute difference with BL model.
  2. Our models do not normalize the predicted scores to [0, 100], then the normazlization operation may bring some errors. So does Baidu IQA API.
  3. As you mentioned, the BL model was trained on SPAQ only and Baidu IQA API might have a great difference with it.

I carefully read your paper again. In chapter 5.1, where the trainning process of Baseline Model is described, you said that l1-norm is used and ground truth of q is MOS which is a continuous score in [0,100] to represent the overall qulity of image.

I used BL_release.pt model to do the test of 9800 images. All scores I got are around 45, with min-35 and max-60. The score of the blurred image on the top is 45 and I divided it by 100 to normalize values to [0,1]. I donot think that image should got such a high score. In fact, the score of that image from Baidu is 0.0025 after normalizing to [0,1] by dividing 100.

Then I normalize scores of all images by using min & max values. The formular is v_norm = (v-min)/(max-min) where v is the score I got from the inference of BL_release.py model. Anying wrong with this procedure?

h4nwei commented 4 years ago

Hi KleinXin, The zip file can be downloaded at https://pan.baidu.com/s/1JzwZxwSOpIqcc16cOliBVw (code: 8og5). I think it is reasonable. For the following reasons,

  1. The Baidu IQA API may hard to capture the realistic camera distortions in SPAQ. You can validate it by PLCC and SRCC rather than computing the absolute difference with BL model.
  2. Our models do not normalize the predicted scores to [0, 100], then the normazlization operation may bring some errors. So does Baidu IQA API.
  3. As you mentioned, the BL model was trained on SPAQ only and Baidu IQA API might have a great difference with it.

I carefully read your paper again. In chapter 5.1, where the trainning process of Baseline Model is described, you said that l1-norm is used and ground truth of q is MOS which is a continuous score in [0,100] to represent the overall qulity of image.

I used BL_release.pt model to do the test of 9800 images. All scores I got are around 45, with min-35 and max-60. The score of the blurred image on the top is 45 and I divided it by 100 to normalize values to [0,1]. I donot think that image should got such a high score. In fact, the score of that image from Baidu is 0.0025 after normalizing to [0,1] by dividing 100.

Then I normalize scores of all images by using min & max values. The formular is v_norm = (v-min)/(max-min) where v is the score I got from the inference of BL_release.py model. Anying wrong with this procedure?

Hi KleinXin,

I think the normalization operation is one of good attempts. Were the 9800 images sampled from the SPAQ database?

KleinXin commented 4 years ago

Hi KleinXin, The zip file can be downloaded at https://pan.baidu.com/s/1JzwZxwSOpIqcc16cOliBVw (code: 8og5). I think it is reasonable. For the following reasons,

  1. The Baidu IQA API may hard to capture the realistic camera distortions in SPAQ. You can validate it by PLCC and SRCC rather than computing the absolute difference with BL model.
  2. Our models do not normalize the predicted scores to [0, 100], then the normazlization operation may bring some errors. So does Baidu IQA API.
  3. As you mentioned, the BL model was trained on SPAQ only and Baidu IQA API might have a great difference with it.

I carefully read your paper again. In chapter 5.1, where the trainning process of Baseline Model is described, you said that l1-norm is used and ground truth of q is MOS which is a continuous score in [0,100] to represent the overall qulity of image. I used BL_release.pt model to do the test of 9800 images. All scores I got are around 45, with min-35 and max-60. The score of the blurred image on the top is 45 and I divided it by 100 to normalize values to [0,1]. I donot think that image should got such a high score. In fact, the score of that image from Baidu is 0.0025 after normalizing to [0,1] by dividing 100. Then I normalize scores of all images by using min & max values. The formular is v_norm = (v-min)/(max-min) where v is the score I got from the inference of BL_release.py model. Anying wrong with this procedure?

Hi KleinXin,

I think the normalization operation is one of good attempts. Were the 9800 images sampled from the SPAQ database?

No, all images are acquired from internet.