Closed pete4abw closed 1 year ago
The entire point of the state locking is to reduce the amount of allocations to the bare minimum, improving performance. If you have noticed that the performance gain is not significant enough to argue for the complexity, then I think that your patches are a good idea.
The entire point of the state locking is to reduce the amount of allocations to the bare minimum, improving performance. If you have noticed that the performance gain is not significant enough to argue for the complexity, then I think that your patches are a good idea.
I just noticed that threads were locked already and locking the bzip3 wrapper variables separately was unnecessary. While I did not measure performance, I cannot imagine it to be any different. An argument could be made that by not calling the wrapper functions and calling bz3_encode_block
and bz3_decode_block
directly , there is a potential performance improvement.
Pending your review, I'm going to freeze this now and leave it for review and testing.
You seem to have misunderstood me. The bz3_new
call allocates memory, quite a bit of it too. After your patch, the memory will be allocated each and every time a block is compressed. With my old version, memory allocation was (ideally?) performed once per the entire compression process. Provided that malloc
called in bz3_new
tends to be pretty slow, taking up to 10'000 cycles per call, it's preferable to avoid calling it without a good reason.
An argument could be made that by not calling the wrapper functions and calling bz3_encode_block and bz3_decode_block directly, there is a potential performance improvement.
Any decent compiler would've inlined these nonetheless.
I will test the changes soon and report back.
snip... Provided that
malloc
called inbz3_new
tends to be pretty slow, taking up to 10'000 cycles per call, it's preferable to avoid calling it without a good reason.
Because of the improved logic implemented in allocating compression buffers, not all threads may be used. If compression is great enough, fewer threads will be used. I understand your intent. Do it once, not each on each call. But all other compression methods work this way and I believe this is easier to maintain.
Sure, sounds ok.
Testing the removal of all the BZIP3 wrapper functions and variables in
stream.c
. Threads are themselves locked when calling compression/decompression functions. Locking all threads and bzip3 variables a second time is unnecessary and a waste of memory, Callingbz3_new
andbz3_free
and callingbz3_encode_block
andbz3_decode_block
from within the bz3 compression/decompression functions is enough. @kspalaiologos , interested in your review.Changes pushed to
bzip3_poc
tree.