Open cwoffenden opened 2 years ago
Thank you, this looks very valuable. Have you tested these changes on a large amount of content to verify the output encoding hasn't changed? That's one of my primary concerns, initially.
I'll try to find the time to throw a few thousand grayscale and normal maps at it and verify the error metrics and times. The encoded output may differ (e.g. two-value BC4 will always use selectors 0
and 1
instead of a single endpoint plus an interpolation) but the decoded output at 8-bit should be the same, so I could hash the decoded PNG. I don't think I'll be able to do that in the next few weeks though.
I have some initial results. I wrote this (rather sprawling) test runner to verify everything:
https://gist.github.com/cwoffenden/98780e9009a2d4f62433ea9f77ef4113
You can give it a directory of PNGs and it'll compress them then collect the metrics in a CSV file. For example:
./runbc7enc.py -b 4 -o /Volumes/Temp -x ./bc7enc -l nfproj-grey-orig.csv -t /Volumes/Work/Assets/Test/Numfum/grey
This ran the BC4 encoder on 450-ish greyscale files and recorded the max error, RMSE and PSNR (but ignored the time, just so I could do a quick diff). Here are the results: the original and changed code.
The RMSE and PSNR don't change (probably not enough digits) but the max error does, in an interesting way. There are five differences in this set of files, with four of the five having a lower-by-one max error in the new code. It's interesting because it highlights a potential accidental improvement which I'll look at in the week (better selection of the best block).
I'll cover the processing time later when I've thrown more files at it (short version: it's faster, about 20% average when fed 100s of normal maps). On Mac it doesn't build with OpenMP (it's not supported out of the box) so I want to wait until I'm back at work to test on other OSes.
I can share the test files with you so you can verify if you like? I have a classifier go through internal projects and pull out different texture types.
I ran the same on approx 1400 other greyscale files and recorded two more from them all where the max error is lower in the changed code. CSV files here.
It's totally accidental that it swings this way, since I've seen a few normal maps where the max lower is in the original code. It's to do with taking the summed error and calling the lowest value the best, rather than looking at which equal summed errors have lower averages or maximums. Specifically here:
trial_err
needs to have further refinement.
This fixes #17 but goes further:
Lots of text snipped, jump down to the next paragraph. Originally this expanded the internal endpoints to 14-bit, but in testing the RMSE and PSNR were always slightly worse even though the max error was reduced. These errors were higher due being calculated from the 8-bit PNG file, not the hardware's representation. Ryg's blog entry has a good explanation of the hardware.
I simplified this commit to address the main issue, which was blocks with two (or few) values having errors in hardware due to one endpoint always being interpolated (which doesn't occur with an 8-bit software decoder). This is achieved by starting the search radius at zero and working outwards (0, -1, 1, -2, 2, etc.). Further, once we have zero error we take this block as the best available and exit early.
This fixes the original issue, keeps the max error, RMSE and PSNR exactly the same, and improves performance. Some timings, using the default
-hr5
radius:Original code:
This commit:
All timings were from the best of four runs. The biggest improvement was in normal maps since there are large areas with 2-3 values hovering around 127, and since the search radius is now growing outwards these are found early on.