luben / zstd-jni

JNI binding for Zstd
Other
853 stars 168 forks source link

[PR] Add support for native memory compression and decompression #311

Open VladRodionov opened 5 months ago

VladRodionov commented 5 months ago

This PR introduces support for handling native memory buffers that are allocated using the sun.misc.Unsafe.allocateMemory API. With this update, it is now possible to compress and decompress data between two native memory buffers, as well as transfer data from a byte array to native memory and vice versa.

VladRodionov commented 5 months ago

This feature is essential for any application which works with off heap memory directly.

VladRodionov commented 5 months ago

Will add unit tests.

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 0% with 54 lines in your changes missing coverage. Please review.

Project coverage is 57.88%. Comparing base (c76455c) to head (b904897). Report is 10 commits behind head on master.

:exclamation: Current head b904897 differs from pull request most recent head c75f02f

Please upload reports for the commit c75f02f to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #311 +/- ## ============================================ - Coverage 60.01% 57.88% -2.13% - Complexity 308 312 +4 ============================================ Files 26 26 Lines 1473 1541 +68 Branches 170 186 +16 ============================================ + Hits 884 892 +8 - Misses 434 494 +60 Partials 155 155 ```
VladRodionov commented 5 months ago

Sure, will add test this weekend. Thank you for the review @luben

luben commented 3 months ago

These binary files should not be checked in git - I re-build them on each supported platform for each release.

VladRodionov commented 3 months ago

Files have been removed.

joakime commented 1 month ago

Basing anything off sun.misc.Unsafe behavior is not a good idea anymore. It has been deprecated since 2006. There are 2 active JEPs that are almost done with their implementations and rollout in OpenJDK.

VladRodionov commented 1 month ago

It s long way to go until all Java code with direct sun.misc.Unsafe access will be ported to JDK 21+ (Java FFM), meanwhile we need to support JDK 11+ at least. Performance - wise Unsafe is still the champion, at least for direct memory access.

joakime commented 1 month ago

Performance wise, Unsafe no longer wins. Eclipse Jetty removed Unsafe a few year ago, and the various performance metrics has improved.

VladRodionov commented 1 month ago

Jetty? Can it handle 500K+ RPS out of the box? Really doubt :). JFF is finally on par with JNI or slightly better, but for direct memory access and manipulations of bits and bytes outside of Java heap, Unsafe is the champ. And you missed my reqs - JDK 11+ support (actually Java 8+). Java 2024 report - almost 30% are still using Java 8, the rest - Java 11 and Java 17, all of them are missing JFFM support.

joakime commented 1 month ago

500K+ requests per second is not hard to do. You have to be mindful of network saturation in regards to request/response size and optional http details.

This has been done on an official release of Eclipse Jetty 10, and Jetty 11, and Jetty 12 servers (all of which do not have Unsafe operations anymore).

The setup is as follows ...

This setup results in sub 50 byte requests, and sub 200 byte responses (or about 120 bytes for request on network, and 280 bytes for response on network), which is only really useful for load testing the server for requests per second and latency metrics.

When I monitor (with something like wireshark) with 1 client to confirm its setup, I'm looking at the total bytes on the network and wanting something sub 400 bytes per request/response exchange and no FIN (we should be using persistent connections).

Hitting 510k requests per second is very attainable on a 10GbE network against a Jetty Server with a decent networking interface (some crappy 10GbE interfaces cannot get close to even 20% saturation).

Java 8 went EOSL in many contexts already. (eg: google cloud dropped it Jan 2024) Many Java 11 providers have it going EOSL at the end of this year too (eg: redhat in october, google in december)

VladRodionov commented 1 month ago

https://medium.com/deno-the-complete-reference/netty-vs-jetty-hello-world-performance-e9ce990a9294

far from 510K RPS. May be its attainable, may be its - not. Not, I presume. Any Java network server which utilizes any type of thread pool executors will be handicapped due to significant thread context switch overhead. You are free to share links, which confirm, that 510K RPS is attainable for Jetty. I have not managed to find any proof of that statement, quite contrary, I found many benchmarks with a very abysmal performance and latency numbers. I am the developer of the Memcarrot - memcached-compatible caching server, written in Java with a heavy dosage of sun.misc.Unsafe. All memory management is manual (malloc(), free()). The server can run in less than 100MB of java heap while storing hundreds of millions of cached objects. Below are yesterday's test results (standard testing tool - memtier_benchmark was used):

parallels@ubuntu-linux-22-04-02-desktop:~$ memtier_benchmark -p 11211 -P memcache_text --test-time=100
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 100 secs]  0 threads:    53594005 ops,  535766 (avg:  535922) ops/sec, 15.35MB/sec (avg: 15.31MB/sec),  0.37 (avg:  0.37) msec latency

4         Threads
50        Connections per thread
100       Seconds

ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets        48721.12          ---          ---         0.37491         0.35900         0.66300         0.93500      3325.15 
Gets       487201.32       634.00    486567.32         0.37314         0.35900         0.66300         0.94300     12355.39 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals     535922.44       634.00    486567.32         0.37331         0.35900         0.66300         0.94300     15680.54 

This is 535K RPS with p99.9 latency less than 1ms. These numbers are within 5% of native memcached. The test have been run on Mac Studio M1 (64GB RAM).

Other benchmark results (memory consumption, surprise, surprise) are here: https://github.com/carrotdata/membench

Memcarrot will be released next week. sun.misc.Unsafe made it possible. This is why we need direct access to off heap memory and I am not sure that the code can we rewritten with JFFM API.

joakime commented 1 month ago

https://medium.com/deno-the-complete-reference/netty-vs-jetty-hello-world-performance-e9ce990a9294

An unconfigured Jetty and testing on the same machine, that person just tested the performance of their localhost network stack, nothing else. That is a horrible set of tests and doesn't test performance of Jetty. Using jetty-maven-plugin:run which is focused on developer needs by its configuration, not performance. The configuration they used also did zero for tuning the http exchange. I bet their Jetty server was barely being used, they simply couldn't generate enough load (a super common scenario when attempting to load test on the same machine).

Any Java network server which utilizes any type of thread pool executors will be handicapped due to significant thread context switch overhead.

Jetty doesn't use native JVM thread pool executors, it's got it's own and a EatWhatYouKill model that minimizes thread context switching, we even see improvements on CPU caching with this model.

When we participated in the TechEmpower benchmarks years ago (back in Jetty 10.0.0 days) we were consistently in the top 5%, and when we learned the tricks of the those above us we could easily get into the top 3%, but those tricks were not representing real world scenarios.

joakime commented 1 month ago

I am the developer of the Memcarrot - memcached-compatible caching server, written in Java with a heavy dosage of sun.misc.Unsafe. All memory management is manual (malloc(), free()). The server can run in less than 100MB of java heap while storing hundreds of millions of cached objects. Below are yesterday's test results (standard testing tool - memtier_benchmark was used):

Congrats, that's a really fantastic outcome.

Anyway, this is devolved into a totally different set of arguments. Do what you want. It is your repo after all.

Eclipse Jetty just has to monitor how the new JVMs react to our usage of the current state of zstd-jni. (so far it looks like we have to, at a minimum, document the demands that zstd-jni put to ByteBufferPool implementation, and the JVM command line switches necessary to allow zstd-jni to function.)