zwx8981 / LIQE

[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
MIT License
197 stars 11 forks source link

About the evaluation on different datasets #7

Closed liuppboy closed 1 year ago

liuppboy commented 1 year ago

Hi,

Thanks for your great work. In your paper, you use the pairwise learning-to-rank training strategy to train one model on different dataset. In this case, the predicted MOS is a relative score, how do you obtain the absolute score on different datasets? If not aligned properly, the PLCC metric shall be very low. Thanks!

zwx8981 commented 1 year ago

Hi, thanks for your interest in our work!

We use the pairwise learning-to-rank method to train a model on multiple datasets simultaneously. As such, the quality scores predicted by the resultant model are dataset-agnostic. In other words, the resultant model is actually attempting to unify the perceptual scales of different datasets into a common space, where the images from different datasets can be compared according to the predicted scores. Therefore, the predicted MOS is not a relative score.

To compute the PLCC results, we follow the common practice to learn a 4-parameter logistic nonlinear mapping function for each dataset (see the compute_metrics function in BIQA_benchmark.py).