Open orisano opened 1 year ago
Many thanks for the PR. I will review it this week.
benchstat:
goos: linux
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
│ base.txt │ xz#55.txt │
│ sec/op │ sec/op vs base │
LargeLZMA2-8 10.775 ± 0% 8.704 ± 0% -19.22% (p=0.000 n=10)
@ulikunitz How can I help?
Just want to add, we just benchmarked this against our workload, and we even see a 25% improvement. Awesome work @orisano!
Hi, please adapt the change for the rewrite branch as I suggested for the other changes.
Description:
With immense respect for the work done so far, I am writing this pull request to propose an optimization that would enhance the performance of the
github.com/bodgit/sevenzip
project, which relies on thelzma
module fromgithub.com/ulikunitz/xz
.Currently, I am working on the decompression of large-scale 7z archives (using lzma2) in the GB range. As part of this effort, I have noticed that the repetitive calls to the DecodeBit function within the rangeDecoder module lead to frequent memory writes and potential performance bottlenecks.
To address this issue, I propose inlining the DecodeBit function, which would reduce the overhead of function calls and make efficient use of registers for intermediate states. By doing so, we can significantly improve the decompression performance of large-scale 7z archives.
The key advantages of this optimization include:
During my local testing, I observed a notable performance improvement of approximately 19% in scenarios involving repetitive calls to DecodeBit. This suggests the potential benefits it can bring to the
github.com/bodgit/sevenzip
project, particularly when decompressing GB-sized 7z archives.before:
after:
I kindly request your review and feedback on this pull request. Thank you.