veluca93 / fpnge

Demo of a fast PNG encoder.
Apache License 2.0
81 stars 8 forks source link

Re-use Paeth predicted data from TryPredictor #11

Closed animetosho closed 2 years ago

animetosho commented 2 years ago

Most of PNG's predictors are fairly trivial to compute, however Paeth is relatively slow. It also happens to be the most often chosen predictor. This results in Paeth being computed twice for many rows.

The idea of this change is to save the Paeth predicted data during TryPredictor, then re-use this during actual encoding.
It's likely not worth doing this for the other predictors, due to being fast to compute and being less likely chosen.

Comparison on a 12700K:

Old code - image 1
   378.738 MP/s
    10.787 bits/pixel
Old code - image 2
   414.767 MP/s
    16.240 bits/pixel

New code - image 1
   447.864 MP/s
    10.787 bits/pixel
New code - image 2
   457.003 MP/s
    16.240 bits/pixel

CLA response: I release these changes to the public domain subject to the CC0 license (https://creativecommons.org/publicdomain/zero/1.0/).

google-cla[bot] commented 2 years ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

animetosho commented 2 years ago

On a different topic: I've got a change for supporting CRC without PCLMUL being required (uses slice-by-8 technique for CRC). The change splits the code into separate files, which breaks the current 'one source file' layout of the code.
The idea of the file split is to prepare support for runtime dispatch of code paths (if fpnge goes there). Compilers don't work too well with supporting different ISAs in the same source file, so splitting these across files allows for different compiler options to be used across them.

Does the idea seem good to you?

veluca93 commented 2 years ago

On a different topic: I've got a change for supporting CRC without PCLMUL being required (uses slice-by-8 technique for CRC). The change splits the code into separate files, which breaks the current 'one source file' layout of the code. The idea of the file split is to prepare support for runtime dispatch of code paths (if fpnge goes there). Compilers don't work too well with supporting different ISAs in the same source file, so splitting these across files allows for different compiler options to be used across them.

Does the idea seem good to you?

I was thinking that given the length of the code, splitting things up in multiple files is probably necessary eventually, but I'd like if at all possible to be able to "concatenate" everything in a single file in a somewhat automatic way (if perhaps losing some speed/features). WDYT?

On a somewhat unrelated note, do you have any way I could contact you in a somewhat less public way than GitHub PRs? :)

animetosho commented 2 years ago

I was thinking that given the length of the code, splitting things up in multiple files is probably necessary eventually, but I'd like if at all possible to be able to "concatenate" everything in a single file in a somewhat automatic way (if perhaps losing some speed/features). WDYT?

I don't particularly have any strong opinion over how code should be formatted (if you hadn't already guessed) and actually have no objection to storing everything in a single file.
The splitting is less about organisation and more about being able to use different compiler switches across the code.

This is somewhat more critical if you want to support runtime CPU detection. Compiling with -mavx2 means that the compiler can use AVX2 anywhere in the file - this is problematic if you want, for example, both an SSE4.1 and a AVX2 code path in the same binary, because you don't want the compiler generating AVX2 instructions in your SSE4.1 code. But trying to compile without -mavx2 means that the AVX2 code won't compile.

GCC/Clang does have __attribute__((target("..."))) as a workaround, but there's no MSVC equivalent. Having said that, MSVC doesn't have strict targeting of ISAs anyway, so you could probably do it without splitting files, but I guess it's less "correct" (presumably, 128-bit instructions in AVX2 code paths would use SSE encoding instead of VEX, that you'd get with /arch:AVX2).

On a somewhat unrelated note, do you have any way I could contact you in a somewhat less public way than GitHub PRs? :)

I can send you an email if your Git commit email (@google.com) is fine?

veluca93 commented 2 years ago

I was thinking that given the length of the code, splitting things up in multiple files is probably necessary eventually, but I'd like if at all possible to be able to "concatenate" everything in a single file in a somewhat automatic way (if perhaps losing some speed/features). WDYT?

I don't particularly have any strong opinion over how code should be formatted (if you hadn't already guessed) and actually have no objection to storing everything in a single file. The splitting is less about organisation and more about being able to use different compiler switches across the code.

This is somewhat more critical if you want to support runtime CPU detection. Compiling with -mavx2 means that the compiler can use AVX2 anywhere in the file - this is problematic if you want, for example, both an SSE4.1 and a AVX2 code path in the same binary, because you don't want the compiler generating AVX2 instructions in your SSE4.1 code. But trying to compile without -mavx2 means that the AVX2 code won't compile.

GCC/Clang does have __attribute__((target("..."))) as a workaround, but there's no MSVC equivalent. Having said that, MSVC doesn't have strict targeting of ISAs anyway, so you could probably do it without splitting files, but I guess it's less "correct" (presumably, 128-bit instructions in AVX2 code paths would use SSE encoding instead of VEX, that you'd get with /arch:AVX2).

Anything works for me as long as it allows a reasonably simple way to get a simple "drop in" version for people that care about that usecase :)

On a somewhat unrelated note, do you have any way I could contact you in a somewhat less public way than GitHub PRs? :)

I can send you an email if your Git commit email (@google.com) is fine?

Absolutely!

animetosho commented 2 years ago

it allows a reasonably simple way to get a simple "drop in" version for people that care about that usecase

Well it already isn't exactly that, as compiler flags usually need to be set, but fair enough, I'll aim to keep that goal in mind.

Absolutely!

I sent an email, but I'm guessing it probably got blocked/filtered considering that Google's AI seems hellbent on labeling me a spammer.
If you have a Reddit account, you could message me here, otherwise I don't really have any other ideas.

veluca93 commented 2 years ago

it allows a reasonably simple way to get a simple "drop in" version for people that care about that usecase

Well it already isn't exactly that, as compiler flags usually need to be set, but fair enough, I'll aim to keep that goal in mind.

Absolutely!

I sent an email, but I'm guessing it probably got blocked/filtered considering that Google's AI seems hellbent on labeling me a spammer. If you have a Reddit account, you could message me here, otherwise I don't really have any other ideas.

I think I wrote you there :)