photosynthesis-team / piq

Measures and metrics for image2image tasks. PyTorch.
Apache License 2.0
1.4k stars 120 forks source link

Metric: CLIPIQA #348

Closed snk4tr closed 1 year ago

snk4tr commented 1 year ago

This PR implements CLIPIQA metric described in Wang et. al. 2022. Closes #331

The main reason to implement CLIP-IQA here is to give a user an opportunity to use the metric without bringing additional dependencies (mmcv/mmedit) to the project.

Note that CLIP-IQA+ won't be implemented here because CLIP weights were fine-tuned with mmcv and hence cannot be loaded and run without it.

Note that the values of this implementation correspond to the values produced by the official CLIP-IQA implementation. SRCC scores of evaluations on public benchmarks may mismatch the ones listed in the paper. We consider official code and weights to be the ultimate source of truth and hence stick with it.

Proposed Changes

Some decisions that may be questioned in the future

  1. We do not use sets of prompt pairs that (as reported in the initial paper) may boost the performance of the initial CLIP-IQA metric. The reason is that the initial implementation does not use them. Hence, adding modification would mean proposing a new metric (modification of the initial one), which we try avoid here.
  2. We do not allow users to provide their own prompts because of the same reason.
  3. Tokens for prompt pairs are pre-computed. It allows us to avoid additional dependencies brought by tokenizer's code.
  4. We had to upgrade the minimal Torchvision version because CLIP models do not work with torchvision < 0.9.1. Also, tensors computed with newer versions cannot be loaded with the old ones.
sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 7 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

codecov[bot] commented 1 year ago

Codecov Report

Merging #348 (2fd9b4d) into master (f8be57b) will decrease coverage by 1.41%. The diff coverage is 83.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #348      +/-   ##
==========================================
- Coverage   92.02%   90.62%   -1.41%     
==========================================
  Files          34       36       +2     
  Lines        2496     2869     +373     
==========================================
+ Hits         2297     2600     +303     
- Misses        199      269      +70     
Flag Coverage Δ
unittests 90.62% <83.91%> (-1.41%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
piq/feature_extractors/clip.py 80.83% <80.83%> (ø)
piq/__init__.py 100.00% <100.00%> (ø)
piq/clip_iqa.py 100.00% <100.00%> (ø)
piq/utils/common.py 97.01% <100.00%> (+1.36%) :arrow_up:

... and 3 files with indirect coverage changes

snk4tr commented 1 year ago

Ready for re-review.

snk4tr commented 1 year ago

For some reason I cannot comment directly reply to this comment so I'll do it here. @denproc nice catch and this one actually let me guess with high probability why the original implementation had this type type conversion. It turns out that:

  1. Conv2d only for Half precision (i.e. float16) implemented only for CUDA operations. Hence, we cannot really support it here because we want to allow both CPU and GPU computation of all our metrics.
  2. There were initially two type conversions and if I remove both of them (including the one @denproc just mentioned), the abs difference in metric values per sample becomes noticeable (~1e-3).

As a result, I think it will be fair to allow computation of the metric only in float32 dtype. However, nothing really stops us from working on a copy of input tensor in order to not change the actual passed tensor type. I will add a comment about float32 to the code as well.

denproc commented 1 year ago

In addition, we have to add the CLIP-IQA into the documentation. This also makes me think if we have all our metrics covered in documentation. I might check it later. UPD: Added #366

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication