UIQA

🏆 🥇 Winner solution for ECCV AIM 2024 UHD-IQA Challenge: Pushing the Boundaries of Blind Photo Quality Assessment at the AIM 2024 workshop @ ECCV 2024

Official Code for Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency

Introduction

UHD images, typically with resolutions equal to or higher than 4K, pose a significant challenge for efficient image quality assessment (IQA) algorithms, as adopting full-resolution images as inputs leads to overwhelming computational complexity and commonly used pre-processing methods like resizing or cropping may cause substantial loss of detail. To address this problem, we design a multi-branch deep neural network (DNN) to assess the quality of UHD images from three perspectives: global aesthetic characteristics, local technical distortions, and salient content perception. Specifically, aesthetic features are extracted from low-resolution images downsampled from the UHD ones, which lose high-frequency texture information but still preserve the global aesthetics characteristics. Technical distortions are measured using a fragment image composed of mini-patches cropped from UHD images based on the grid mini-patch sampling strategy. The salient content of UHD images is detected and cropped to extract quality-aware features from the salient regions. We adopt the Swin Transformer Tiny as the backbone networks to extract features from these three perspectives. The extracted features are concatenated and regressed into quality scores by a two-layer multi-layer perceptron (MLP) network. We employ the mean square error (MSE) loss to optimize prediction accuracy and the fidelity loss to optimize prediction monotonicity. Experimental results show that the proposed model achieves the best performance on the UHD-IQA dataset while maintaining the lowest computational complexity, demonstrating its effectiveness and efficiency. Moreover, the proposed model won first prize in ECCV AIM 2024 UHD-IQA Challenge.

Image Pre-processing

Image Pre-processing Figure

The different image pre-processing methods for UHD images. (a) is the proposed method, which utilizes the resized image, the fragment image, and the salient patch to extract features of aesthetic, distortion, and salient content. (b) samples all non-overlapped image patches for feature extraction. (c) selects three representative patches with the highest texture complexity for feature extraction.

Model

Model Figure

The diagram of the proposed model. It consists of three modules: the image pre-processing module, the feature extraction module, and the quality regression module. We assess the quality of UHD images from three perspectives: global aesthetic characteristics, local technical distortions, and salient content perception, which are evaluated by the aesthetic assessment branch, distortion measurement branch, and salient content perception branch, respectively.

Performance

Compared with state-of-the-art IQA methods

Performance on the validation set of the UHD-IQA dataset

Methods	SRCC	PLCC	KRCC	RMSE	MAE
HyperIQA	0.524	0.182	0.359	0.087	0.055
Effnet-2C-MLSP	0.615	0.627	0.445	0.060	0.050
CONTRIQUE	0.716	0.712	0.521	0.049	0.038
ARNIQA	0.718	0.717	0.523	0.050	0.039
CLIP-IQA+	0.743	0.732	0.546	0.108	0.087
QualiCLIP	0.757	0.752	0.557	0.079	0.064
UIQA	0.817	0.823	0.625	0.040	0.032

Performance on the test set of the UHD-IQA dataset

Methods	SRCC	PLCC	KRCC	RMSE	MAE
HyperIQA	0.553	0.103	0.389	0.118	0.070
Effnet-2C-MLSP	0.675	0.641	0.491	0.074	0.059
CONTRIQUE	0.732	0.678	0.532	0.073	0.052
ARNIQA	0.739	0.694	0.544	0.052	0.739
CLIP-IQA+	0.747	0.709	0.551	0.111	0.089
QualiCLIP	0.770	0.725	0.570	0.083	0.066
UIQA	0.846	0.798	0.657	0.061	0.042

Performance on ECCV AIM 2024 UHD-IQA Challenge

Team	SRCC	PLCC	KRCC	RMSE	MAE
SJTU MMLab (ours)	0.846	0.798	0.657	0.061	0.042
CIPLAB	0.835	0.800	0.642	0.064	0.044
ZX AIE Vector	0.795	0.768	0.605	0.062	0.044
I2Group	0.788	0.756	0.598	0.066	0.046
Dominator	0.731	0.712	0.539	0.072	0.052
ICL	0.517	0.521	0.361	0.136	0.115

for more results on the ECCV AIM UHD IQA challenge, please refer to the challenge report.

Usage

Environments

Requirements:

torch(>=1.13), torchvision, pandas, ptflops, numpy, Pillow

Create a new environment

conda create -n UIQA python=3.8
conda activate UIQA 
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia # this command install pytorch version of 2.40, you can install pytorch >=1.13
pip install pandas ptflops numpy

Dataset

Download the UHD-IQA dataset.

Train UIQA

Download the pre-trained model on AVA.

CUDA_VISIBLE_DEVICES=0,1 python -u train.py \
--num_epochs 100 \
--batch_size 12 \
--n_fragment 15 \
--resize 512 \
--crop_size 480 \
--salient_patch_dimension 480 \
--lr 0.00001 \
--lr_weight_L2 0.1 \
--lr_weight_pair 1 \
--decay_ratio 0.9 \
--decay_interval 10 \
--random_seed 1000 \
--snapshot ckpts \
--pretrained_path ckpts/Model_SwinT_AVA_size_480_epoch_10.pth \
--database_dir UHDIQA/challenge/training/ \
--model UIQA \
--multi_gpu True \
--print_samples 20 \
--database UHD_IQA \
>> logfiles/train_UIQA.log

Test UIQA

Put your trained model in the ckpts folder, or download the provided trained model (model weights, quality alignment profile file) on the UHD-IQA dataset into the ckpts folder.

CUDA_VISIBLE_DEVICES=0 python -u test_single_image.py \
--model_path ckpts/ \
--trained_model_file UIQA.pth \
--popt_file UIQA.npy \
--image_path demo/8.jpg \
--resize 512 \
--crop_size 480 \
--n_fragment 15 \
--salient_patch_dimension 480 \
--model UIQA

Citation

If you find this code is useful for your research, please cite:

@article{sun2024assessing,
  title={Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency},
  author={Sun, Wei and Zhang, Weixia and Cao, Yuqin and Cao, Linhan and Jia, Jun and Chen, Zijian and Zhang, Zicheng and Min, Xiongkuo and Zhai, Guangtao},
  journal={arXiv preprint arXiv:2409.00749},
  year={2024}
}

sunwei925 / UIQA

readme