google / brunsli

Practical JPEG Repacker
MIT License
730 stars 51 forks source link

please support multithread compress/decompress, the speed is much slow than lepton #46

Closed mychaow closed 4 years ago

eustas commented 4 years ago

It is quite possible to parallelise the most CPU consuming parts of encoding without changing the format. Unfortunately, same trick would not work for decoder (because of long calculation dependency chains). We have experimental "groups" encoding that splits image into 256x256 px pieces and processes them in parallel. This allows 3x speedup. Hopefully, we will publish it soon.

Could you please elaborate how much is "much slow than lepton"?

mychaow commented 4 years ago

Thanks for reply. In my tests, Brunsli is faster than lepton within single-thread, about 2x, but Brunsli's decompression is slower than lepton with multithread, about 3x~4.7x, so if brunsli could support multithread, it's support to be faster than lepton.

It's pleasure to see Brunsli support multithread compress/decompress soon.

Lepton' code is in https://github.com/dropbox/lepton.

mychaow commented 4 years ago

BTW, What does the follow sentence means?

We have experimental "groups" encoding that splits image into 256x256 px pieces and processes them in parallel. This allows 3x speedup. Hopefully, we will publish it soon.

It's encoded within multithread or SIMD? If it's multithread, it should be faster in proportion to the number of threads, such as 4 threads, it will allows 4x speedup.

eustas commented 4 years ago

PR #48 contains experimental / POC mode that allows parallel encoding / decoding. This does not include parallel JPEG serialization / deserialization, so that could be a bottleneck. Give it a try, to see if it improves the situation.

Warning: "groups" file format is not compatible with vanilla Brunsli.

mychaow commented 4 years ago

Thanks for reply.

I have tested the "groups" encode/decode, default version allows 2x speedup. After changing the threadpool from 4 to 8 threads, it allows 3x speedup. Within 8 threads, "groups" encode/decode is faster than lepton in encode, about 1.4x, but decode is slower than lepton, about 10%. Looking forward to seeing further optimization.

It's pleasure to see the "groups" encode/decode so soon. Thank you!

eustas commented 4 years ago

As I've mentioned before - serialisation of JPEG is single threaded. This could be parallelised, if "reset interval" has proper value. Unfortunately, out of 100k random internet JPEGs only 30k has it set...

eustas commented 4 years ago

(PS: if byte-wise JPEG reconstruction is a non-goal, "reset interval" could be set during encoding).

mychaow commented 4 years ago

ahaha, "groups" encode/decode's speed is acceptable.