kornelski commented 3 years ago

That works great, but why does it have .powf(1/2^n)?

kornelski commented 3 years ago

That weight per scale score tweak hinted that perhaps the original scale weights were non-optimal. I've adjusted them, and that gave a decent improvement.

igv commented 3 years ago

but why does it have .powf(1/2^n)?

My tweak. Initially it was .powf(0.5) (MAD used in MDSI metric and they set it to 0.25 (EDIT: I forgot, it's a different parameter in MDSI)), but then I found out that when comparing only at first scale (SSIM), the best accuracy is with .powf(1) (almost on par with VIF, try it), so I tested it with each scale separately. It's called mean absolute deviation, but you can also use median or mode or your own weighted mean.

I've adjusted them, and that gave a decent improvement.

I wouldn't change default scale weights only based on 1 dataset. Also Full accuracy on tid13 is so high with these weights because of Exotic group (which is pretty much irrelevant).

Finally, according to tid2013 paper, SSIM is among the best metrics for "Good quality" (Table 6), so maybe add -ssim switch to calculate SSIM instead of MS-SSIM.

igv commented 3 years ago

Afaik, this is how these scale weights calculated:

cpd represent cycles per degree of visual angle which is determined by the viewing resolution factor.

kornelski commented 3 years ago

Yes, indeed I'm worried about overfitting for TID2013. Can you recommend other similar datasets that I could use to verify accuracy?

igv commented 3 years ago

The most similar is KADID-10k. Only they made a mistake and placed reference images instead of first level distorted images for 3 distortion types (1st, 3rd and 8th). The most popular are LIVE and CSIQ. For some reason not very popular, but seems good - ESPL.

kornelski commented 3 years ago

With my max+avg pooling method my thinking was:

if there's a small but significant distortion in the image (e.g. a single bad pixel), then it actually affects a larger area than itself, because it's likely to be spoiling an entire object/area it is in. This is why there's a "max" pooling step.
But an image likely contains multiple objects, and one flaw is not disqualifying the whole image, which is why there's an averaging step too.

That solution is not mathematically elegant, but I like it, because it intuitively makes sense to me.

I've read how MAD works, but I'm not quite sure why this particular formula, especially with the "tweak", happens to work for pooling SSIM.

My hypothesis is:

For small n in pow(1/2^n) MAD is behaving like standard deviation, so it gives worse scores to images with multiple bad areas, but is insensitive to uniformly bad images.
For bigger n it makes avg approach 1, which makes MAD behave more like a simple average, which lets the other layers still catch uniformly bad images.

That makes me wonder whether this tweaked MAD just happens to be another way of mixing "worst" vs "average" error pooling, just not within each scale, but across scales.

And I wonder whether the proportion from pow(1/2^n) is ideal, or maybe it should be based on scale weights. This is something to experiment with.

igv commented 3 years ago

It doesn't behave like std dev (I edited my comment above about MAD in MDSI) and it always makes avg approach 1 (that's the idea) except at 1st scale. 1/pow(1/2^n) is just a short for [1, 0.5, 0.25, 0.125, 0.0625][n], it has no relation to scale weights and impact on accuracy and ssim scores is minimal (vs for example [1.0, 0.5, 0.5, 0.5, 0.5][n]).

At first, after arithmetic mean, I tried geometric and harmonic means. They made accuracy worse (harmonic was the worst). From wikipedia: For all positive data sets containing at least one pair of nonequal values, the harmonic mean is always the least of the three means, while the arithmetic mean is always the greatest of the three and the geometric mean is always in between. So I added that exponent to bias avg towards good quality scores. The bigger downsampling factor the higher and less spread per-pixel ssim scores are (except for some Exotic distortion types probably), so the exponent value must be lower to be effective.

That makes me wonder whether this tweaked MAD just happens to be another way of mixing "worst" vs "average" error pooling, just not within each scale, but across scales.

It's still within each scale, not across scales (cross scale pooling is in this block).

And I wonder whether the proportion from pow(1/2^n) is ideal

Maybe not ideal, but I tried with different constants (like 1.5, 1.75, 2.25, 2.5) and also with inverse square law formula (1/n^2).

igv commented 3 years ago

Tested on KADID-10k using scripts from here (only modified kadid10k.ROCC.py to show per type and per level accuracy).

1st is max+avg, 2nd is mad with non-custom scale weights.

Full SROCC: -0.8525045482307988 -0.8562234657395806

Per type:

1: -0.9482709952684674 -0.9524144401242203

2: -0.888920125560901 -0.9036008877848952

3: -0.9361601718850355 -0.942732720555493

4: -0.9035452008410343 -0.9054443006074244

5: -0.7505439463819739 -0.7721170099865738

6: -0.877401725595138 -0.8779770694488326

7: -0.6083002027988245 -0.5992010191419064

8: -0.9202345152541264 -0.9192436523541861

9: -0.9316652477020476 -0.9389697724114447

10: -0.8725177456252896 -0.8922033461740347

11: -0.8973088936601757 -0.8921109535136115

12: -0.9312734837315821 -0.9300191542963452

13: -0.8081662575202928 -0.8028804197128389

14: -0.940544646168691 -0.9413759571962214

15: -0.8984371485985785 -0.8985430184742703

16: -0.9182612634610633 -0.9197175157873034

17: -0.8913491941768678 -0.8932455426471694

18: -0.7350325589908322 -0.7442556609995762

19: -0.9281146136611702 -0.931734634444742

20: -0.36244064860224723 -0.37055703484238806

21: -0.4742315253111955 -0.48728508844891577

22: -0.8585757413960478 -0.8672451350219674

23: -0.5884619420813891 -0.5820224099228524

24: -0.8857312844251738 -0.8900615092376575

25: -0.75146916282513 -0.7677897943743471

Per level:

-0.7594226220260838 -0.7568424979839856

-0.7859239032568683 -0.7905589936285138

-0.7645387478162078 -0.7726706920108036

-0.6711213371339377 -0.6899222144730359

-0.598736695180533 -0.6224275615532653

kadid

igv commented 3 years ago

kadid10k.ROCC.py:

import pandas as pd

df = pd.read_csv('kadid10k.dssim1.csv')
df2 = pd.read_csv('kadid10k.dssim2.csv')

print('Full SROCC:')
print(df[['dmos', 'DSSIM']].corr('spearman').loc["DSSIM"][0])
print(df2[['dmos', 'DSSIM']].corr('spearman').loc["DSSIM"][0])

print('\nFull KROCC:')
print(df[['dmos', 'DSSIM']].corr('kendall').loc["DSSIM"][0])
print(df2[['dmos', 'DSSIM']].corr('kendall').loc["DSSIM"][0])

print("\nPer type:")
for i in range(25):
    a=df['dist_img'].str.split("_", expand=True)[:][1].astype(int).isin([i+1])
    print("{}:".format(i+1))
    print(df[['dmos', 'DSSIM']][a].corr('spearman').loc["DSSIM"][0])
    print(df2[['dmos', 'DSSIM']][a].corr('spearman').loc["DSSIM"][0])
    print("\n------------------")

print("\nPer level:")
for i in range(5):
    a=df['dist_img'].str.split("_", expand=True)[:][2].str.split(".", expand=True)[:][0].astype(int).isin([i+1])
    print(df[['dmos', 'DSSIM']][a].corr('spearman').loc["DSSIM"][0])
    print(df2[['dmos', 'DSSIM']][a].corr('spearman').loc["DSSIM"][0])
    print("\n------------------")

igv commented 3 years ago

Btw, there is my implementation of MDSI in Python. I removed it from my repo because I didn't like this metric very much. But maybe it's actually good?

kornelski commented 3 years ago

Thanks for the data. It looks good.

It makes DSSIM output scores that have a different magnitude than previously, so I'll probably add some rescaling fudge in order to avoid breaking users to much.

igv commented 3 years ago

Thinking 5x5 binomial / Gaussian (std=1) down-sampling filter might work better with MAD than 2x2 avg.

kornelski commented 3 years ago

I guess it could help a little. IIRC change of 2x box blur to proper Gaussian kernel in SSIM blurring helped a little too.

igv commented 3 years ago

Made a python version of this metric (grayscale only) for testing, with an alternative version of MAD.

import sys
from PIL import Image
import numpy as np
from scipy.ndimage import gaussian_filter

WEIGHTS = [0.0448]#, 0.2856, 0.3001, 0.2363, 0.1333]

def msssim(file1, file2):
    img1 = Image.open(file1).convert('RGB')
    img2 = Image.open(file2).convert('RGB')

    width, height = img1.size
    img1 = np.frombuffer(img1.tobytes(), dtype=np.uint8).reshape(height, width, 3) / 255
    img2 = np.frombuffer(img2.tobytes(), dtype=np.uint8).reshape(height, width, 3) / 255

    img1 = np.where(img1 > 0.04045, np.power((img1 + 0.055) / 1.055, 2.4),  img1 / 12.92)
    img2 = np.where(img2 > 0.04045, np.power((img2 + 0.055) / 1.055, 2.4),  img2 / 12.92)

    img1 = 0.2126 * img1[:,:,0] + 0.7152 * img1[:,:,1] + 0.0722 * img1[:,:,2]
    img2 = 0.2126 * img2[:,:,0] + 0.7152 * img2[:,:,1] + 0.0722 * img2[:,:,2]

    mssim = []
    for i in range(len(WEIGHTS)):
        mssim.append(ssim(pow(img1,1./2.2), pow(img2,1./2.2), i, i<len(WEIGHTS)-1))
        #img1 = avgpooling(img1, 2)
        #img2 = avgpooling(img2, 2)
        img1 = gaussian_filter(img1, 1.08, truncate=1.5)[::2,::2]
        img2 = gaussian_filter(img2, 1.08, truncate=1.5)[::2,::2]

    return np.sum(np.multiply(np.stack(mssim), WEIGHTS)) / np.sum(WEIGHTS)

def mad(x, l):
    return np.mean(np.absolute(x - np.power(np.mean(x), np.power(.5, l)))) # np.mean(np.absolute(x - np.mean(x if l==0 else np.sort(x, axis=None)[-int(x.size//1.5):])))

def ssim(L1, L2, lvl, cs_map):
    C1=(0.01)**2
    C2=(0.03)**2
    sd, t = 1.5, 3 #kernel radius = round(sd * truncate)

    mu1 = gaussian_filter(L1, sd, truncate=t)
    mu2 = gaussian_filter(L2, sd, truncate=t)
    mu1_sq = mu1 * mu1
    mu2_sq = mu2 * mu2
    mu1_mu2 = mu1 * mu2
    sigma1_sq = gaussian_filter(L1 * L1, sd, truncate=t) - mu1_sq
    sigma2_sq = gaussian_filter(L2 * L2, sd, truncate=t) - mu2_sq
    sigma12 = gaussian_filter(L1 * L2, sd, truncate=t) - mu1_mu2

    if cs_map:
        value = (2.0*sigma12 + C2)/(sigma1_sq + sigma2_sq + C2)
    else:
        value = ((2.0*mu1_mu2 + C1)*(2.0*sigma12 + C2))/((mu1_sq + mu2_sq + C1)*
                    (sigma1_sq + sigma2_sq + C2))

    return mad(value, lvl)

def avgpooling(mat, ksize):
    m, n = mat.shape[:2]
    ny = m // ksize
    nx = n // ksize
    mat_pad = mat[:ny*ksize, :nx*ksize, ...]
    new_shape = (ny, ksize, nx, ksize) + mat.shape[2:]
    result = np.nanmean(mat_pad.reshape(new_shape), axis=(1,3))
    return result

def main():
    for arg in sys.argv[2:]:
        score = msssim(sys.argv[1], arg)
        print(str(score) + "\t" + arg)

if __name__ == '__main__':
    main()

kornelski / dssim

Use mean absolute deviation for pooling #89

Per type:

Per level: