Closed mychaow closed 4 years ago
Thanks for reply. In my tests, Brunsli is faster than lepton within single-thread, about 2x, but Brunsli's decompression is slower than lepton with multithread, about 3x~4.7x, so if brunsli could support multithread, it's support to be faster than lepton.
It's pleasure to see Brunsli support multithread compress/decompress soon.
Lepton' code is in https://github.com/dropbox/lepton.
BTW, What does the follow sentence means?
We have experimental "groups" encoding that splits image into 256x256 px pieces and processes them in parallel. This allows 3x speedup. Hopefully, we will publish it soon.
It's encoded within multithread or SIMD? If it's multithread, it should be faster in proportion to the number of threads, such as 4 threads, it will allows 4x speedup.
PR #48 contains experimental / POC mode that allows parallel encoding / decoding. This does not include parallel JPEG serialization / deserialization, so that could be a bottleneck. Give it a try, to see if it improves the situation.
Warning: "groups" file format is not compatible with vanilla Brunsli.
Thanks for reply.
I have tested the "groups" encode/decode, default version allows 2x speedup. After changing the threadpool from 4 to 8 threads, it allows 3x speedup. Within 8 threads, "groups" encode/decode is faster than lepton in encode, about 1.4x, but decode is slower than lepton, about 10%. Looking forward to seeing further optimization.
It's pleasure to see the "groups" encode/decode so soon. Thank you!
As I've mentioned before - serialisation of JPEG is single threaded. This could be parallelised, if "reset interval" has proper value. Unfortunately, out of 100k random internet JPEGs only 30k has it set...
(PS: if byte-wise JPEG reconstruction is a non-goal, "reset interval" could be set during encoding).
ahaha, "groups" encode/decode's speed is acceptable.
It is quite possible to parallelise the most CPU consuming parts of encoding without changing the format. Unfortunately, same trick would not work for decoder (because of long calculation dependency chains). We have experimental "groups" encoding that splits image into 256x256 px pieces and processes them in parallel. This allows 3x speedup. Hopefully, we will publish it soon.
Could you please elaborate how much is "much slow than lepton"?