mapbox / pixelmatch

The smallest, simplest and fastest JavaScript pixel-level image comparison library
ISC License
6.2k stars 307 forks source link

Potential inlining/strength reduction optimization in colorDelta #80

Closed JobLeonard closed 4 years ago

JobLeonard commented 4 years ago

Recently I adapted the colorDelta function from the code in this repo in one of my Observable notebooks, and more or less on autopilot inlined rgb2y, rgb2i, and rgb2q:

  // function rgb2y(r, g, b) { return r * 0.29889531 + g * 0.58662247 + b * 0.11448223; }
  // function rgb2i(r, g, b) { return r * 0.59597799 - g * 0.27417610 - b * 0.32180189; }
  // function rgb2q(r, g, b) { return r * 0.21147017 - g * 0.52261711 + b * 0.31114694; }

  // const y = rgb2y(r1, g1, b1) - rgb2y(r2, g2, b2);
  // const i = rgb2i(r1, g1, b1) - rgb2i(r2, g2, b2);
  // const q = rgb2q(r1, g1, b1) - rgb2q(r2, g2, b2);
  const r = r1 - r2;
  const g = g1 - g2;
  const b = b1 - b2;
  const y = r * 0.29889531 + g * 0.58662247 + b * 0.11448223
  const i = r * 0.59597799 - g * 0.27417610 - b * 0.32180189
  const q = r * 0.21147017 - g * 0.52261711 + b * 0.31114694
  return 0.5053 * y * y + 0.299 * i * i + 0.1957 * q * q;

This removes six function calls, six multiplications and three additions. Since this is a hot function I'm sure that the function calls are probably inlined anyway, but the JIT compiler can't perform the other optimization for us ;). Since colorDelta is the heart of this algorithm I wonder if that small change adds up or not.

Also, now that I'm writing this out it got me thinking: barring rounding errors the following should be equivalent:

  0.5053 * y * y + 0.299 * i * i + 0.1957 * q * q
  // Math.sqrt(0.5053) === 0.7108445681019163
  // Math.sqrt(0.299) === 0.5468089245796927
  // Math.sqrt(0.1957) === 0.4423799272118933

  // Therefore this can be rewriten as:
  0.7108445681019163 * 0.7108445681019163 * y * y + 
  0.5468089245796927 * 0.5468089245796927 * i * i + 
  0.4423799272118933 * 0.4423799272118933 *q * q

 // or:
  (0.7108445681019163 * y)**2 + (0.5468089245796927 * i)**2 + (0.4423799272118933 *q)**2

 // Therefore, this should also work: (except for the `yOnly` case obviously)

  const y = 0.7108445681019163 * (r * 0.29889531 + g * 0.58662247 + b * 0.11448223)
  const i = 0.5468089245796927 * (r * 0.59597799 - g * 0.27417610 - b * 0.32180189)
  const q = 0.4423799272118933 * (r * 0.21147017 - g * 0.52261711 + b * 0.31114694)
  return y*y + i*i + q*q

 // Which can be pre-computed to:

  const y = r * 0.2124681075446384 + g * 0.4169973963260294 + b * 0.08137907133969426;
  const i = r * 0.3258860837850668 - g * 0.14992193838645426 - b * 0.17596414539861255;
  const q = r * 0.0935501584120867 - g * 0.23119531908149002 + b * 0.13764516066940333
  return y*y + i*i + q*q

The latter form would save another three multiplications, for a total of nine (going from 21 down to 12).

So I haven't benchmarked any of this, so I don't know the actual impact on performance it has, but technically it should be less work for the CPU at least. I guess we should also check if the results are the same - I did add the change described in this comment to my notebook and it seems to work fine, but that's not a pixel-to-pixel comparison ;)

The actual code would probably look something like:

  const r = r1 - r2;
  const g = g1 - g2;
  const b = b1 - b2;
  if (yOnly) return r * 0.29889531 + g * 0.58662247 + b * 0.11448223;

  // inlined rgb2y, rgb2i and rgb2q, with strength-reduced constants
  const y = r * 0.2124681075446384 + g * 0.4169973963260294 + b * 0.08137907133969426;
  const i = r * 0.3258860837850668 - g * 0.14992193838645426 - b * 0.17596414539861255;
  const q = r * 0.0935501584120867 - g * 0.23119531908149002 + b * 0.13764516066940333;
  return y*y + i*i + q*q;
mourner commented 4 years ago

Really nice suggestion! I got curious so I made a quick benchmark running on 1000+ render tests in Mapbox GL JS, commenting out the "identical images" fast path — this change doesn't seem to have any noticeable performance gain, so probably not worth sacrificing code clarity for. The low impact is probably due to the fact that for every full-color colorDelta call, we do 16 brightness-only ones, which benefit from the micro-optimization less.

Also an off-topic fun fact from the bench: it takes 18 times as long to png-decode the fixtures compared to pixelmatch running time. 😅

JobLeonard commented 4 years ago

The low impact is probably due to the fact that for every full-color colorDelta call, we do 16 brightness-only ones, which benefit from the micro-optimization less.

I mean, that's also a reduction from six to three multiplications, so percentage-wise the difference is even bigger ;)

Also an off-topic fun fact from the bench: it takes 18 times as long to png-decode the fixtures compared to pixelmatch running time. :sweat_smile:

Ouch! Just shows how important it is do benchmark properly ;)

JobLeonard commented 4 years ago

Anyway, thanks for checking! It was a fun thought-exercise at least :)