Fix bug in coeff gen on more than 3x downsamples with w and h equal

Charles reported a slight darkening of alpha on one output scanline of a 2052 to 256 resample (instead of a 1.0 expected output, it would be 0.99681 across one scanline).

The summary is that this was a bug in the pivot code (we always build the coefficients in gather mode, and then pivot them for scatter mode). Basically, an extra aligning zero coefficient in the gather mode list could sometimes cause the pivoter to overwrite the next scatter coefficient's list weight. Since this overwrote the smallest coefficient, the effect was subtle, and I missed it in my test thresholds.

The bug was tricky to track down, since: 1) We only use scatter on vertical scaling 2) We only use scatter when the vertical scaling is more than 3x 3) The extra aligning coefficient only gets inserted on the horizontal coefficients when we have lumpy sampling (when each gather list alternates between 3 and 4 coeffs, for example) 4) But the horizontal coefficients are unrelated to the verticals, i imagine you asking. Except sometimes they aren't - when we notice the vertical and horizontal sampling parameters are exactly the same, we just pivot from the horizontal gathers right into the vertical scatters.

Whew. Anyway, fixed, and there is an assert to make sure we don't accidentally have that happen again.

Fallout: at identical horizonal/vertical downscales (at more than 3x), with non-integral scaling (we need to be aligning the final coeff list of one of the horizontals) - one weighting coeff would be zeroed. This coeff is the left most edge one so the effect is usually tiny. I tested all downsamples from 4096 to (4096...1), and about 2% of them would exhibit the problem and about 25% of those would be enough to pull at byte from 255 to 254 with rounding. It never effected more than one scanline in a single scaling ratio (just how it worked out).

So, this is a determinism difference from the previous version, but it only affects a small-ish percentage of downsamples.

HOWEVER, I also changed the coeff generation to be done in doubles by default for a bit more accuracy. This will cause a complete change from the previous version. If you just want the small fix, you can define STBIR_RENORMALIZE_IN_FLOAT to keep the coeff generation in float, and the rescale differences from version to version will remain tiny.

nothings / stb

Fix bug in coeff gen on more than 3x downsamples with w and h equal #1609