[Bug] Systematic rounding errors in BC1 decoding

RunDevelopment commented 2 years ago

While I discovered and will explain the problem with BC1, this is a general problem that will likely also affect other decoders.

For a while now, I noticed that Pfim-decoded BC1 images tended to have a slightly green tint that was not present in the original DDS. I know that the green tint wasn't present in the original because:

The image is pure grayscale (this is dictated by the material model of the game I'm working with), so apart from dithered compression artifacts, I expected no systematic discolorations. This is indeed the case in game, where my graphics card correctly decodes the image as gray.
I can look at the image in Paint.Net and there is no green tint. Paint.Net uses DirectX to decode DDS files, so its implementation is most likely correct. It also agrees with the in-game texture.
I can convert the DDS to PNG using TexConv and there is no green tint.

After doing the calculations for one block by hand, I can say with certainty that the cause is systematic rounding errors. Pfim does round every operation correctly, but the rounding errors add up and cause the differences I saw.

Example

This is the simplest example I found. The texture is a 256x256 BC1_UNORM_SRGB-encoded image with a constant color of RGB=47 47 47 (8bit). Every block in the image has the following values:

ushort color0 = 0b00110_001100_00110 // RGB565 = 6 12 6
ushort color1 = 0b00101_001011_00101 // RGB565 = 5 11 5
uint bitmap = 0b10101010101010101010101010101010

Note: As you can see, all indexes in the bitmap are 2, so I will only explain how color2 is calculated, since every pixel in the image is identical and uses it.

DirectX

DirectX uses floating point values to represent colors. First, it converts the RGB565 color to a floating point vector and normalizes it. This yields the following colors (here multiplied by 255 to make the comparison with Pfim's 8bit colors easier).

color0 = [ 49.354838709677416, 48.57142857142857, 49.354838709677416 ]
color1 = [ 41.12903225806451, 44.52380952380952, 41.12903225806451 ]

color2 is calculated using a simple floating point linear interpolation:

color2 = [ 46.61290322580644, 47.22222222222222, 46.61290322580644 ]

Rounding those floating point numbers to get the 8bit RGB, we get the color RGB = 47 47 47.

Pfim

Pfim uses bitshifts to convert RGB565 to 8bit RGB. The result is as follows:

color0 = [ 49, 49, 49 ]
color1 = [ 41, 45, 41 ]

color2 is then calculated as (2*color0 + 1*color1) / 3 (rounded down, because integer division).

color2 = [ 46, 47, 46 ]

The final color is RGB = 46 47 46.

Comparison

As we can see, DirectX's rounded fp numbers for color0 and color1 are the saem as Pfim's. However, due to a lack of precision, Pfim does not calculate color2 correctly. This error seems to manifest as a green tint in many images, because the R and B channels are one too low.

Rounding to the nearest integer for the / 3 won't solve the issue. The decimal result of Pfim's (2*color0+color1)/3 is [ 46.333, 47.666, 46.333 ], so rounding to the nearest int will yield RGB = 46 48 46. While this would somewhat even out the loss in average brightness, it would make the green tint even worse.

Fix

The easiest solution would be to use floating point numbers, of course. This would trivially solve the rounding errors. The implementation would also be easy to verify for correctness since it has to mirror the DirectX implementation. However, this will likely negatively impact performance as (AFAIK) we do not have (easy) access to the pointer-cast magic and SSE instructions that make DirectX's implementation fast.

One other idea would be to use fixed point numbers (basically just ints with a statically defined maximum). Using a bit of clever arithmetic, we can guarantee that all operations are rounded correctly.

I actually implemented both versions and tested their performance using Pfim.Benchmark. Maybe I did something wrong, but I was not able to measure any significant performance difference between the two. In fact, I was not able to measure a significant difference between the fixed versions and the current version.

Code: Floating point number

```c# namespace Pfim { public class Dxt1Dds : CompressedDds { private const int PIXEL_DEPTH = 4; private const int DIV_SIZE = 4; public Dxt1Dds(DdsHeader header, PfimConfig config) : base(header, config) { } protected override byte PixelDepthBytes => PIXEL_DEPTH; protected override byte DivSize => DIV_SIZE; protected override byte CompressedBytesPerBlock => 8; public override ImageFormat Format => ImageFormat.Rgba32; public override int BitsPerPixel => 8 * PIXEL_DEPTH; private readonly Color8888[] colors = new Color8888[4]; private struct Vec3 { private float r; private float g; private float b; public static void FromRgb565(ushort rgb565, out Vec3 result) { const float F5 = 255f / 31f; const float F6 = 255f / 63f; result.r = ((rgb565 & 0x1f)) * F5; result.g = ((rgb565 & 0x7E0) >> 5) * F6; result.b = ((rgb565 & 0xF800) >> 11) * F5; } public static void Half(ref Vec3 a, ref Vec3 b, out Vec3 result) { result.r = (a.r + b.r) * 0.5f; result.g = (a.g + b.g) * 0.5f; result.b = (a.b + b.b) * 0.5f; } public static void Third(ref Vec3 large, ref Vec3 small, out Vec3 result) { const float THIRD = 1f / 3f; result.r = (2 * large.r + small.r) * THIRD; result.g = (2 * large.g + small.g) * THIRD; result.b = (2 * large.b + small.b) * THIRD; } public void As8bit(ref Color8888 result) { result.r = (byte)(r + 0.5f); result.g = (byte)(g + 0.5f); result.b = (byte)(b + 0.5f); result.a = 255; } } protected override int Decode(byte[] stream, byte[] data, int streamIndex, uint dataIndex, uint stride) { // Colors are stored in a pair of 16 bits ushort color0 = stream[streamIndex++]; color0 |= (ushort)(stream[streamIndex++] << 8); ushort color1 = (stream[streamIndex++]); color1 |= (ushort)(stream[streamIndex++] << 8); // Extract R5G6B5 Vec3.FromRgb565(color0, out var c0); Vec3.FromRgb565(color1, out var c1); c0.As8bit(ref colors[0]); c1.As8bit(ref colors[1]); // Used the two extracted colors to create two new colors that are // slightly different. if (color0 > color1) { Vec3.Third(ref c0, ref c1, out var c2); c2.As8bit(ref colors[2]); Vec3.Third(ref c1, ref c0, out var c3); c3.As8bit(ref colors[3]); } else { Vec3.Half(ref c0, ref c1, out var c2); c2.As8bit(ref colors[2]); colors[3].r = 0; colors[3].g = 0; colors[3].b = 0; colors[3].a = 0; } for (int i = 0; i < 4; i++) { // Every 2 bit is a code [0-3] and represent what color the // current pixel is // Read in a byte and thus 4 colors byte rowVal = stream[streamIndex++]; for (int j = 0; j < 8; j += 2) { // Extract code by shifting the row byte so that we can // AND it with 3 and get a value [0-3] var col = colors[(rowVal >> j) & 0x03]; data[dataIndex++] = col.r; data[dataIndex++] = col.g; data[dataIndex++] = col.b; data[dataIndex++] = col.a; } // Jump down a row and start at the beginning of the row dataIndex += PIXEL_DEPTH * (stride - DIV_SIZE); } // Reset position to start of block return streamIndex; } } } ```

Code: Fixed point ints

```c# namespace Pfim { public class Dxt1Dds : CompressedDds { private const int PIXEL_DEPTH = 4; private const int DIV_SIZE = 4; public Dxt1Dds(DdsHeader header, PfimConfig config) : base(header, config) { } protected override byte PixelDepthBytes => PIXEL_DEPTH; protected override byte DivSize => DIV_SIZE; protected override byte CompressedBytesPerBlock => 8; public override ImageFormat Format => ImageFormat.Rgba32; public override int BitsPerPixel => 8 * PIXEL_DEPTH; private readonly Color8888[] colors = new Color8888[4]; struct FixedPointRGB { uint r; uint g; uint b; const uint MAX = 255 * 31 * 63 * 2 * 3; public static void FromRgb565(ushort rgb565, out FixedPointRGB result) { const uint F5 = MAX / 31; const uint F6 = MAX / 63; result.r = ((rgb565 & 0x1fu)) * F5; result.g = ((rgb565 & 0x7E0u) >> 5) * F6; result.b = ((rgb565 & 0xF800u) >> 11) * F5; } public static void Half(ref FixedPointRGB a, ref FixedPointRGB b, out FixedPointRGB result) { result.r = (a.r + b.r) / 2; result.g = (a.g + b.g) / 2; result.b = (a.b + b.b) / 2; } public static void Third(ref FixedPointRGB large, ref FixedPointRGB small, out FixedPointRGB result) { result.r = (2 * large.r + small.r) / 3; result.g = (2 * large.g + small.g) / 3; result.b = (2 * large.b + small.b) / 3; } public void As8bit(ref Color8888 result) { // This is the equivlent of 0.5. // Since int division always rounds down, we can add 0.5 to round to the nearest int. const uint OFFSET = MAX / 255 / 2; const uint D = MAX / 255; result.r = (byte)((r + OFFSET) / D); result.g = (byte)((g + OFFSET) / D); result.b = (byte)((b + OFFSET) / D); result.a = 255; } } protected override int Decode(byte[] stream, byte[] data, int streamIndex, uint dataIndex, uint stride) { // Colors are stored in a pair of 16 bits ushort color0 = stream[streamIndex++]; color0 |= (ushort)(stream[streamIndex++] << 8); ushort color1 = (stream[streamIndex++]); color1 |= (ushort)(stream[streamIndex++] << 8); // Extract R5G6B5 FixedPointRGB.FromRgb565(color0, out var c0); FixedPointRGB.FromRgb565(color1, out var c1); c0.As8bit(ref colors[0]); c1.As8bit(ref colors[1]); // Used the two extracted colors to create two new colors that are // slightly different. if (color0 > color1) { FixedPointRGB.Third(ref c0, ref c1, out var c2); c2.As8bit(ref colors[2]); FixedPointRGB.Third(ref c1, ref c0, out var c3); c3.As8bit(ref colors[3]); } else { FixedPointRGB.Half(ref c0, ref c1, out var c2); c2.As8bit(ref colors[2]); colors[3].r = 0; colors[3].g = 0; colors[3].b = 0; colors[3].a = 0; } for (int i = 0; i < 4; i++) { // Every 2 bit is a code [0-3] and represent what color the // current pixel is // Read in a byte and thus 4 colors byte rowVal = stream[streamIndex++]; for (int j = 0; j < 8; j += 2) { // Extract code by shifting the row byte so that we can // AND it with 3 and get a value [0-3] var col = colors[(rowVal >> j) & 0x03]; data[dataIndex++] = col.r; data[dataIndex++] = col.g; data[dataIndex++] = col.b; data[dataIndex++] = col.a; } // Jump down a row and start at the beginning of the row dataIndex += PIXEL_DEPTH * (stride - DIV_SIZE); } // Reset position to start of block return streamIndex; } } } ```

Note: I do not know much about how C#'s runtime JIT-optimizes code, so I wrote what I think optimized C# code is. The code is pretty terrible is what I'm saying.

While both of these implementations are correct accoring to a little testing, I very much perfer the floating point one, because there is less magic going on.

Reference image: BC1_UNORM_SRGB-47.zip

RunDevelopment commented 2 years ago

I'm willing to make a PR (based on the floating point implementation), but I want to hear your opinion on my coding style (or rather lack there of) first. If other decoders or also affected by this (e.g. Dxt3 is), then I would like to fix them as well, but I want to know whether you approve of my general approach first.

nickbabcock commented 2 years ago

Oh wow, love your investigation.

I'll take a deeper look at this when I'm not traveling and back at my main dev machine, but I think you're right and the solution you are proposing looks sound. The dxt3 and dxt5 use a similar bit shifting strategy, so I'm expecting that they'd need to be updated too. I'm not too concerned with coding style, as it's something that can be iterated later on.

nickbabcock / Pfim