pete4abw / lrzip-next

Long Range Zip. Updated and Enhanced version of ckolivas' lrzip project. Lots of new features. Better compression. Actively maintained.
https://github.com/pete4abw/lrzip-next
GNU General Public License v2.0
50 stars 10 forks source link

BZIP3 Wrapper functions and state locking not necessary #89

Closed pete4abw closed 1 year ago

pete4abw commented 2 years ago

Testing the removal of all the BZIP3 wrapper functions and variables in stream.c. Threads are themselves locked when calling compression/decompression functions. Locking all threads and bzip3 variables a second time is unnecessary and a waste of memory, Calling bz3_new and bz3_free and calling bz3_encode_block and bz3_decode_block from within the bz3 compression/decompression functions is enough. @kspalaiologos , interested in your review.

Changes pushed to bzip3_poc tree.

kspalaiologos commented 2 years ago

The entire point of the state locking is to reduce the amount of allocations to the bare minimum, improving performance. If you have noticed that the performance gain is not significant enough to argue for the complexity, then I think that your patches are a good idea.

pete4abw commented 2 years ago

The entire point of the state locking is to reduce the amount of allocations to the bare minimum, improving performance. If you have noticed that the performance gain is not significant enough to argue for the complexity, then I think that your patches are a good idea.

I just noticed that threads were locked already and locking the bzip3 wrapper variables separately was unnecessary. While I did not measure performance, I cannot imagine it to be any different. An argument could be made that by not calling the wrapper functions and calling bz3_encode_block and bz3_decode_block directly , there is a potential performance improvement.

Pending your review, I'm going to freeze this now and leave it for review and testing.

kspalaiologos commented 2 years ago

You seem to have misunderstood me. The bz3_new call allocates memory, quite a bit of it too. After your patch, the memory will be allocated each and every time a block is compressed. With my old version, memory allocation was (ideally?) performed once per the entire compression process. Provided that malloc called in bz3_new tends to be pretty slow, taking up to 10'000 cycles per call, it's preferable to avoid calling it without a good reason.

An argument could be made that by not calling the wrapper functions and calling bz3_encode_block and bz3_decode_block directly, there is a potential performance improvement.

Any decent compiler would've inlined these nonetheless.

I will test the changes soon and report back.

pete4abw commented 2 years ago

snip... Provided that malloc called in bz3_new tends to be pretty slow, taking up to 10'000 cycles per call, it's preferable to avoid calling it without a good reason.

Because of the improved logic implemented in allocating compression buffers, not all threads may be used. If compression is great enough, fewer threads will be used. I understand your intent. Do it once, not each on each call. But all other compression methods work this way and I believe this is easier to maintain.

kspalaiologos commented 2 years ago

Sure, sounds ok.