google / android-riscv64

Issues and discussions around RISC-V support in AOSP.
Apache License 2.0
198 stars 13 forks source link

external/zlib optimization #49

Open xboxfanj opened 1 year ago

xboxfanj commented 1 year ago

external/zlib (upstream: https://chromium.googlesource.com/chromium/src/third_party/zlib/) has SSSE3 and NEON intrinsics for adler32 among other things. Should RISC-V have similar changes?

enh-google commented 1 year ago

yes. hilarious that i missed filing this one, despite the fact that i spend more time on dealing with zlib optimizations myself than with any of the other projects. (though that's probably why i missed this --- never seemed worth writing down because it was so obvious!)

but that also reminds me --- i need to hassle @pirama-arumuga-nainar about why the coverage build breaks with https://android-review.git.corp.google.com/c/platform/external/zlib/+/2359720 which would let us remove external/angle's duplicate copy of zlib which currently doesn't build for riscv64 at all...

nick-knight commented 1 year ago

Is it necessary to get patches into zlib? My understanding is that zlib-ng is a much friendlier target for arch-specific (e.g., RISC-V vector) optimizations.

enh-google commented 1 year ago

chromium is the upstream[1] for both Chrome (unsurprisingly) and Android (so "we" [=Google] only have to deal with these optimizations once).

they in turn do sync with zlib but -- as you say -- the fact it's so hard to get arch-specific optimizations into zlib that's the cause of all these forks, so it's probably way easier to get it into chromium zlib directly. that was certainly true for the various arm64 optimizations that were the reason i moved Android over to chromium zlib.


  1. you can check by looking at the METADATA file in an external/* project. here's zlib's for example: https://cs.android.com/android/platform/superproject/+/master:external/zlib/METADATA ... feel free to hassle me personally any time you find one that's incomplete. it's being worked on as part of SBOM concerns, but it's still a work in progress for most projects.
paul-walmsley-sifive commented 1 year ago

@enh-google Any hope of getting Chromium to switch over to zlib-ng as their upstream?

enh-google commented 1 year ago

not likely, no. see https://aws.amazon.com/blogs/opensource/improving-zlib-cloudflare-and-comparing-performance-with-other-zlib-forks and https://dougallj.wordpress.com/2022/08/20/faster-zlib-deflate-decompression-on-the-apple-m1-and-x86/ for example.

@Adenilson has forgotten more than i ever knew about this.

nick-knight commented 1 year ago

Ideally local changes should have a merge request featured in either:

https://cs.android.com/android/platform/superproject/+/master:external/zlib/patches/README;l=41-43;drc=03559ccddad582c1e43af419d69d7a6f0639f8ca

This suggests to me that doing the work at zlib/-ng is a prerequisite.

adenilsoncavalcanti commented 1 year ago

Hello there, thanks for the add @enh-google.

It is a long story that happened along the last 6 years and to spare time the current position is: a) We will keep maintaining Chromium zlib for the foreseeable future.

b) Feel free to send patches, the contribution workflow is through the Chromium process (https://chromium.googlesource.com/chromium/src/+/main/docs/contributing.md).

c) There is no hard requirement to submit patches to other forks.

adenilsoncavalcanti commented 1 year ago

A couple weeks ago, I've added a cmake build system to Chromium zlib, so it is possible today to quickly experiment (i.e. no need to checkout whole Chromium repository).

To checkout and build: a) git clone https://chromium.googlesource.com/chromium/src/third_party/zlib/ chromium-zlib b) cd chromium-zlib && mkdir build && cd build c) cmake -DENABLE_SIMD_OPTIMIZATIONS=1 -DCMAKE_BUILD_TYPE=RELEASE .. d) make -j

You should have at the end of it an executable named 'zlib_bench' that is used to collect performance data.

We generally use the snappy data corpus (https://github.com/google/snappy/tree/main/testdata).

For now the cmake build enables the optimizations only for linux@x86.

A few questions: a) I wasn't aware that RISCV featured SIMD extensions. I wonder if our optimizations could be somehow ported to their vector instructions? b) Is there anyone actually allocated to work on it? c) Is there hardware available in the market to run code with such optimizations?

The 'c' part is particularly important if there is a plan to merge and keep maintaining the code in Chromium zlib.

To learn more about the optimizations, check (https://docs.google.com/presentation/d/1_XWLhF4FcTz_HGc4QqvVapkxMj2-QTOZNdyXvpy2oYU/edit?usp=sharing).

The reported gains are a bit outdated but the slide deck offers a high level overview on the subject.

nick-knight commented 1 year ago

Hi @adenilsoncavalcanti, nice work; thanks for the references.

a) I wasn't aware that RISCV featured SIMD extensions. I wonder if our optimizations could be somehow ported to their vector instructions?

The RISC-V vector extension (V-extension a.k.a. RVV) was ratified in November 2021: https://github.com/riscv/riscv-v-spec (It will be merged into the main ISA spec once the LaTeX -> AsciiDoc conversion is finished.) We look forward to leveraging RVV in projects like Zlib.

Further uplifts for CRC-like things are anticipated from an RVV add-on (separate extension): https://github.com/riscv/riscv-crypto/releases/tag/v20230206 The vclmul(h) instructions are of particular interest. This is still work-in-progress (not ratified), but we are confident the project is nearing completion.

b) Is there anyone actually allocated to work on it?

My team (at SiFive) is allocating resources to Zlib: we think it's important to adoption of RISC-V. Please don't hesitate to reach out over email for more information on logistics.

c) Is there hardware available in the market to run code with such optimizations?

To the best of my knowledge, there is not yet any publicly available silicon with RVV. (I recall that Alibaba/T-head released an SoC with a custom vector extension derived from a draft RVV, but I believe it's incompatible with what was eventually ratified.) My team does our performance work in (RTL/FPGA) simulation. Upstream QEMU has supported RVV for quite a while, so that's another option for developers who just care about functionality (vs. cycle accuracy).

adenilsoncavalcanti commented 1 year ago

Pretty cool, thanks for sharing the information, I really appreciate it.

LaTeX, I have sweet memories of it (perfect for math formulas, I loved it!). :-)

About vclmul: if the idea is to implement CRC-32 with folding and exploring the Barrett reduction (like in PCLMULQDQ@x86 and PMULL@Armv8), latencies are key.

The PMULL based CRC-32 we ship since last year is fast on the M1 because its latency/throughput is excellent, it takes 3 cycles in a Firestorm core (https://dougallj.github.io/applecpu/firestorm-simd.html). Newer Arm Cortex cores also benefited of it.

If you can't guarantee that, a serial calculation using a dedicated instruction (i.e. crc32b, crc32w, crc32x) will be slightly faster.

Mark Adler (original creator of canonical zlib, maintainer in the last 28 years) has added a Kadatch & Jenkins based CRC-32 (braided and parallel algorithm) on zlib 1.2.12 and it is remarkably fast as long you have enough ALUs. It came with a price of using bigger tables.

I decided to add it to Chromium zlib in April 2022 because that would help Armv7 and RISCV that lacks crypto extensions. Plus it helped a bit x86 (+1%) since we still fallback to scalar code to handle the tail of data inputs.

You can check the numbers on x86 here: https://bugs.chromium.org/p/chromium/issues/detail?id=1032721#c61

adenilsoncavalcanti commented 1 year ago

@paul-walmsley-sifive since you asked about zlib-ng, you may want to check this: https://github.com/zlib-ng/zlib-ng/issues/1217#issuecomment-1085339594

And this: https://github.com/zlib-ng/zlib-ng/issues/1217#issuecomment-1085326629

alexsifivetw commented 1 year ago

SiFive upstream some optimized kernels in zlib-ng. https://github.com/zlib-ng/zlib-ng/tree/develop/arch/riscv

Neustradamus commented 11 months ago

To follow this ticket, because there are 2 products, important to have the good name: