luben / zstd-jni

JNI binding for Zstd
Other
809 stars 165 forks source link

The first time zstd compress was executed, it took an unusually long time #256

Closed livelyRyan closed 1 year ago

livelyRyan commented 1 year ago

The reproduction method is as follows:

    @SneakyThrows
    public static void main(String[] args) {
        test(FileUtil.readAsString(new File("D:\\idea-projects\\tmp\\demo1\\src\\main\\resources\\3")));
        test(FileUtil.readAsString(new File("D:\\idea-projects\\tmp\\demo1\\src\\main\\resources\\1")));
        test(FileUtil.readAsString(new File("D:\\idea-projects\\tmp\\demo1\\src\\main\\resources\\2")));
    }

    @SneakyThrows
    public static void test(String json) {
        byte[] bs = json.getBytes();

        long start = System.currentTimeMillis();
        int oldLen = LZFEncoder.encode(bs).length;
        long end = System.currentTimeMillis();
        System.out.println("LZFEncoder cost: " + (end - start));

        start = System.currentTimeMillis();
        int newLen = Zstd.compress(bs).length;
        end = System.currentTimeMillis();
        System.out.println("ZstdUtil cost: " + (end - start));

        System.out.println("oldLen: " + oldLen + ", newLen: " + newLen + ", rate: " + (oldLen * 1.0 / newLen));
    }

(Don't care about the code, it's just a test)

The contents of the file are json strings of size 1Mb. The result of the run is as follows:

image

The compression ratio of zstd is bullish. But zstd took very, very long to complete the first compression.

What is the reason for this? Is there any way to avoid it? (I suspect that it is because of the first execution, to load additional class libraries, caused by the problem)

luben commented 1 year ago

I think this is mostly due to the need to extract and link the native library into the process - on first invocation, we extract the right DLL into the temp storage and ask JVM to load and link it. So there are multiple bottlenecks:

Another option to test if extracting and writing the DLL is the problem is to manually extract it somewhere and pass the path to it in the ZstdNativePath property, e.g. java -DZstdNativePath=/path/to/my/zstd-jni-1.5.5-1.dll ...

Another thing you may test is to call com.github.luben.zstd-jni.util.Native.load() first and then measure the Zstd.compress(...) call - this will separate the time to load from the compression. If you see compression taking more time still on the first call, then may be it's due to allocating buffers on the first call that don't need to be re-allocated on the next calls.

I am interested to know what you find.

livelyRyan commented 1 year ago

new code and result:

    @SneakyThrows
    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        Native.load();
        System.out.println("load cost: " + (System.currentTimeMillis() - start));

        test(FileUtil.readAsString(new File("D:\\idea-projects\\tmp\\demo1\\src\\main\\resources\\3")));
        test(FileUtil.readAsString(new File("D:\\idea-projects\\tmp\\demo1\\src\\main\\resources\\1")));
        test(FileUtil.readAsString(new File("D:\\idea-projects\\tmp\\demo1\\src\\main\\resources\\2")));
    }
load cost: 1453
LZFEncoder cost: 73
ZstdUtil cost: 26
oldLen: 242346, newLen: 17180, rate: 14.10628637951106
LZFEncoder cost: 17
ZstdUtil cost: 3
oldLen: 320748, newLen: 20805, rate: 15.416870944484499
LZFEncoder cost: 8
ZstdUtil cost: 5
oldLen: 399262, newLen: 21431, rate: 18.63011525360459

thx!

luben commented 1 year ago

Thanks for confirming!