Closed Okapist closed 1 year ago
I guess the overhead comes from allocating big array and later copying just the decompressed part if the originalSize
turns out to be wrong. I don't think this is a fail on the Zstd itself, and most on supplying wrong originalSize
.
Using Zstd.decompressedSize
is interesting, but may be also limited, as not all zstd compressed payloads have their original size embedded in the header, e.g. if it was compressed with the ZstdOutputStream.
What about best practice for my case? I think it's typical usecase.
I have some compressed data. Typical size is about 100kb, but sometimes there some megabytes. I want to decompress all this data. And I want some zipbomb protection. If there more than 20mb, I don't need this data. Exception or null is ok.
I write something like this:
public byte[] decompress(byte[] data) {
long size = -1;
try {
size = Zstd.decompressedSize(data); //possible exception I guess. on data without size in header
} catch (Exception ex) {
return Zstd.decompress(data, 20_000_000);
}
if (size < 1 || size >= 20_000_000)
return Zstd.decompress(data, 20_000_000);
else
return Zstd.decompress(data, (int) size);
}
May be add this code to library? For peoples who don't know size and simply want to decompress with zipbomb protection.
I think best practice is to include exact size of uncompressed payload along with the compressed data, and then apply that when decompressing. You probably also want some checksum on both the compressed payload as well as inside, along with the uncompressed data, to avoid issues with broken payloads… I think you want to do this “outside” of the zstd library, in your own application.
From: Okapist @.> Reply to: luben/zstd-jni @.> Date: Thursday, 10 November 2022 at 15.33 To: luben/zstd-jni @.> Cc: Subscribed @.> Subject: [External] Re: [luben/zstd-jni] Zsdt.decompress is extremly slow with too big originalSize parameter (Issue #239)
What about best practice for my case? I think it's typical usecase.
I have some compressed data. Typical size is about 100kb, but sometimes there some megabytes. I want to decompress all this data. And I want some zipbomb protection. If there more than 20mb, I don't need this data. Exception or null is ok.
I write something like this: public byte[] decompress(byte[] data) { long size = -1; try { size = Zstd.decompressedSize(data); //possible exception I guess. on data without size in header } catch (Exception ex) { return Zstd.decompress(data, 20_000_000); } if (size < 1 || size >= 20_000_000) return Zstd.decompress(data, 20_000_000); else return Zstd.decompress(data, (int) size); } May be add this code to library? For peoples who don't know size and simply want to decompress with zipbomb protection.
— Reply to this email directly, view it on GitHub [github.com], or unsubscribe [github.com]. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks. I'll change my data store scheme.
May be add some comments at library code? That increase originalSize much more than real uncompressed data size will dramatically drops performance. It's unintuitive behavior.
Yes, I think warning in the API docs should be helpful.
Example:
Get some random data
And run benchmarks:
Result:
May be fix it with something like this?