Closed animetosho closed 2 years ago
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
On a different topic: I've got a change for supporting CRC without PCLMUL being required (uses slice-by-8 technique for CRC). The change splits the code into separate files, which breaks the current 'one source file' layout of the code.
The idea of the file split is to prepare support for runtime dispatch of code paths (if fpnge goes there). Compilers don't work too well with supporting different ISAs in the same source file, so splitting these across files allows for different compiler options to be used across them.
Does the idea seem good to you?
On a different topic: I've got a change for supporting CRC without PCLMUL being required (uses slice-by-8 technique for CRC). The change splits the code into separate files, which breaks the current 'one source file' layout of the code. The idea of the file split is to prepare support for runtime dispatch of code paths (if fpnge goes there). Compilers don't work too well with supporting different ISAs in the same source file, so splitting these across files allows for different compiler options to be used across them.
Does the idea seem good to you?
I was thinking that given the length of the code, splitting things up in multiple files is probably necessary eventually, but I'd like if at all possible to be able to "concatenate" everything in a single file in a somewhat automatic way (if perhaps losing some speed/features). WDYT?
On a somewhat unrelated note, do you have any way I could contact you in a somewhat less public way than GitHub PRs? :)
I was thinking that given the length of the code, splitting things up in multiple files is probably necessary eventually, but I'd like if at all possible to be able to "concatenate" everything in a single file in a somewhat automatic way (if perhaps losing some speed/features). WDYT?
I don't particularly have any strong opinion over how code should be formatted (if you hadn't already guessed) and actually have no objection to storing everything in a single file.
The splitting is less about organisation and more about being able to use different compiler switches across the code.
This is somewhat more critical if you want to support runtime CPU detection. Compiling with -mavx2
means that the compiler can use AVX2 anywhere in the file - this is problematic if you want, for example, both an SSE4.1 and a AVX2 code path in the same binary, because you don't want the compiler generating AVX2 instructions in your SSE4.1 code. But trying to compile without -mavx2
means that the AVX2 code won't compile.
GCC/Clang does have __attribute__((target("...")))
as a workaround, but there's no MSVC equivalent. Having said that, MSVC doesn't have strict targeting of ISAs anyway, so you could probably do it without splitting files, but I guess it's less "correct" (presumably, 128-bit instructions in AVX2 code paths would use SSE encoding instead of VEX, that you'd get with /arch:AVX2
).
On a somewhat unrelated note, do you have any way I could contact you in a somewhat less public way than GitHub PRs? :)
I can send you an email if your Git commit email (@google.com) is fine?
I was thinking that given the length of the code, splitting things up in multiple files is probably necessary eventually, but I'd like if at all possible to be able to "concatenate" everything in a single file in a somewhat automatic way (if perhaps losing some speed/features). WDYT?
I don't particularly have any strong opinion over how code should be formatted (if you hadn't already guessed) and actually have no objection to storing everything in a single file. The splitting is less about organisation and more about being able to use different compiler switches across the code.
This is somewhat more critical if you want to support runtime CPU detection. Compiling with
-mavx2
means that the compiler can use AVX2 anywhere in the file - this is problematic if you want, for example, both an SSE4.1 and a AVX2 code path in the same binary, because you don't want the compiler generating AVX2 instructions in your SSE4.1 code. But trying to compile without-mavx2
means that the AVX2 code won't compile.GCC/Clang does have
__attribute__((target("...")))
as a workaround, but there's no MSVC equivalent. Having said that, MSVC doesn't have strict targeting of ISAs anyway, so you could probably do it without splitting files, but I guess it's less "correct" (presumably, 128-bit instructions in AVX2 code paths would use SSE encoding instead of VEX, that you'd get with/arch:AVX2
).
Anything works for me as long as it allows a reasonably simple way to get a simple "drop in" version for people that care about that usecase :)
On a somewhat unrelated note, do you have any way I could contact you in a somewhat less public way than GitHub PRs? :)
I can send you an email if your Git commit email (@google.com) is fine?
Absolutely!
it allows a reasonably simple way to get a simple "drop in" version for people that care about that usecase
Well it already isn't exactly that, as compiler flags usually need to be set, but fair enough, I'll aim to keep that goal in mind.
Absolutely!
I sent an email, but I'm guessing it probably got blocked/filtered considering that Google's AI seems hellbent on labeling me a spammer.
If you have a Reddit account, you could message me here, otherwise I don't really have any other ideas.
it allows a reasonably simple way to get a simple "drop in" version for people that care about that usecase
Well it already isn't exactly that, as compiler flags usually need to be set, but fair enough, I'll aim to keep that goal in mind.
Absolutely!
I sent an email, but I'm guessing it probably got blocked/filtered considering that Google's AI seems hellbent on labeling me a spammer. If you have a Reddit account, you could message me here, otherwise I don't really have any other ideas.
I think I wrote you there :)
Most of PNG's predictors are fairly trivial to compute, however Paeth is relatively slow. It also happens to be the most often chosen predictor. This results in Paeth being computed twice for many rows.
The idea of this change is to save the Paeth predicted data during
TryPredictor
, then re-use this during actual encoding.It's likely not worth doing this for the other predictors, due to being fast to compute and being less likely chosen.
Comparison on a 12700K:
CLA response: I release these changes to the public domain subject to the CC0 license (https://creativecommons.org/publicdomain/zero/1.0/).