Closed veluca93 closed 2 years ago
My latest version (column V), fpnge4 (column Y), animeTosho's filter search (column AD) and this change (column AI).
This change achieves the same speedup on photographic images as my heuristics, at a slightly higher but still <1% density increase.
On the synthetic portion however, it's having a bit of trouble with several images. Same speedup, but ~10% losses in density instead of ~3%
fpnge4
--- speedup% --- size%
aggregate +54.24 +2.43
photographic +44.68 +0.34
synthetic +63.8 +4.52
My heuristics
--- speedup% --- size%
aggregate +32.55 +1.50
photographic +35.95 +0.05
synthetic +29.15 +2.95
These heuristics
--- speedup% --- size%
aggregate +46.36 +5.89
photographic +43.34 +0.86
synthetic +49.37 +10.92
Ehhhh doesn't seem very useful. I'll try a couple more things and see what I can get, but as is I don't see a reason for these heuristics at all
animeTosho's filter search (column AD)
Weird that there's no density change at all - my tests show something very different. Are you sure it's actually enabled? (FPNGE_SAD_PREDICTOR
needs to be defined)
@animetosho
$ git remote show origin
* remote origin
Fetch URL: https://github.com/animetosho/fpnge/
$ git branch
main
* sadpred
$ ./build.sh -FPNGE_SAD_PREDICTOR
Ah, should be compiled with -DFPNGE_SAD_PREDICTOR
to define.
That's more like it!
Corrected:
animetosho filter search
--- speedup% --- size%
aggregate +34.81 +2.59
photographic +33.58 +1.10
synthetic +36.04 +4.08
That's more in line with what I'm seeing, thanks.
Out of interest, what CPU are you testing on?
i5-10210U. So AVX2, but no AVX512
@animetosho Given the predictable huffman table fpnge uses, would it be possible to have a slightly more accurate cost estimation than SAD? Or would any extra instructions in there kill off any performance gain?
It's likely possible, but I haven't tried it.
would it be possible to have a slightly more accurate cost estimation than SAD?
I gave it a try; this replaces SAD with a cost estimate more tuned to how fpnge does Huffman. Compile with -DFPNGE_SAD_PREDICTOR=2
to enable.
@animetosho That seems to obliterate the density losses almost completely!
animetosho filter search 2
--- speedup% --- size%
aggregate +35.79 +0.12
photographic +36.62 +0.02
synthetic +35.97 +0.22
@animetosho That seems to obliterate the density losses almost completely!
animetosho filter search 2 --- speedup% --- size% aggregate +35.79 +0.12 photographic +36.62 +0.02 synthetic +35.97 +0.22
That's quite nice! Do you think you could also try sampling a random 10% or so of the row? That should be faster and probably not significantly worse.
Another potentially interesting idea could be to limit the set of tested predictors to just the 2 best on every 16th row or so, similarly to what @hohMiyazawa was doing.
Getting a ~2% speedup (and obviously density losses) by completely disabling predictors 1 and 2. So I don't see much more to gain with heuristics at this point.
A conservative heuristic that falls back to paeth-only when it is really really sure that the current image is photographic ought to give a 5% speed boost for those, but that's about it.
That doesn't seem worth it indeed. Perhaps reducing the fraction of the row that gets analyzed, but even then probably not.
On Wed, 27 Jul 2022, 12:55 hohMiyazawa, @.***> wrote:
Getting a ~2% speedup (and obviously density losses) by completely disabling predictors 1 and 2. So I don't see much more to gain with heuristics at this point.
A conservative heuristic that falls back to paeth-only when it is really really sure that the current image is photographic ought to give a 5% speed boost for those, but that's about it.
— Reply to this email directly, view it on GitHub https://github.com/veluca93/fpnge/pull/18#issuecomment-1196575499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOPAIYGNZE3D3DZT5GDJPTVWEIQJANCNFSM54XJXJHQ . You are receiving this because you authored the thread.Message ID: @.***>
This estimation technique should be slower than SAD - I'm guessing your speed comparison is varying a bit too much?
I get the following on a 12700K (I do get some run-to-run variance, as the frequency isn't locked):
('animetosho filter search 2' is labelled 'Estimate')
So I do suspect you could get something out of skipping filter checks. Though unless you skip it entirely (i.e. select a predictor without trialing), it's probably only worth skipping Paeth as the others should be fairly quick to compute.
Sampling part of the row might also help, but I haven't tried.
Since SAD2 is more expensive than SAD1, I'm getting a slight speed boost by switching to SAD1 after consecutive runs of paeth, and otherwise using SAD2.
Skipping predictor 3 when doing SAD1 search after paeth runs seems to have very little impact on compression and claws back some more speed, but I don't trust that there aren't images where this hurts compression.
It could be worth using a less accurate predictor by skipping this, if you want to investigate.
Unfortunately, this is likely most beneficial with Paeth, but being less accurate here means that the prediction data can't be reused during actual encoding pass.
@veluca93 assuming we want filter selection heuristics, how do you want to support it as a selectable feature?
Would you prefer to stick with preprocessor toggles, or have some sort of 'compression/speed' setting at runtime?
If the latter, what would you prefer in the FPNGEEncode
API - a 'level' parameter (similar to zlib) or perhaps some 'options struct' which gives better flexibility at the expense of a more complicated API?
@veluca93 assuming we want filter selection heuristics, how do you want to support it as a selectable feature?
Would you prefer to stick with preprocessor toggles, or have some sort of 'compression/speed' setting at runtime? If the latter, what would you prefer in the
FPNGEEncode
API - a 'level' parameter (similar to zlib) or perhaps some 'options struct' which gives better flexibility at the expense of a more complicated API?
That's a good question, and I don't think I have a very strong opinion -- perhaps an options struct would be better, but it's also more work (i.e. command line parsing). OTOH, it would allow future extension to, say, LZ77 search...
@veluca93 assuming we want filter selection heuristics, how do you want to support it as a selectable feature? Would you prefer to stick with preprocessor toggles, or have some sort of 'compression/speed' setting at runtime? If the latter, what would you prefer in the
FPNGEEncode
API - a 'level' parameter (similar to zlib) or perhaps some 'options struct' which gives better flexibility at the expense of a more complicated API?That's a good question, and I don't think I have a very strong opinion -- perhaps an options struct would be better, but it's also more work (i.e. command line parsing). OTOH, it would allow future extension to, say, LZ77 search...
On the other other hand, maybe a level parameter is simple, because it means one needs to implement fewer combinations...
https://github.com/veluca93/fpnge/pull/19 is just better :)
@hohMiyazawa could you test on your image set how well this change does?
In particular, it'd be useful to compare with yours, and with just always using Paeth (i.e. compiling with -DFPNGE_FIXED_PREDICTOR=4).