ulikunitz / xz

Pure golang package for reading and writing xz-compressed files
Other
485 stars 45 forks source link

feat: inline DecodeBit function for improved performance #55

Open orisano opened 1 year ago

orisano commented 1 year ago

Description:

With immense respect for the work done so far, I am writing this pull request to propose an optimization that would enhance the performance of the github.com/bodgit/sevenzip project, which relies on the lzma module from github.com/ulikunitz/xz.

Currently, I am working on the decompression of large-scale 7z archives (using lzma2) in the GB range. As part of this effort, I have noticed that the repetitive calls to the DecodeBit function within the rangeDecoder module lead to frequent memory writes and potential performance bottlenecks.

To address this issue, I propose inlining the DecodeBit function, which would reduce the overhead of function calls and make efficient use of registers for intermediate states. By doing so, we can significantly improve the decompression performance of large-scale 7z archives.

The key advantages of this optimization include:

During my local testing, I observed a notable performance improvement of approximately 19% in scenarios involving repetitive calls to DecodeBit. This suggests the potential benefits it can bring to the github.com/bodgit/sevenzip project, particularly when decompressing GB-sized 7z archives.

before:

goos: linux
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkLargeLZMA2-8              1    400362537495 ns/op

after:

goos: linux
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
BenchmarkLargeLZMA2-8              1    323613023997 ns/op

I kindly request your review and feedback on this pull request. Thank you.

ulikunitz commented 1 year ago

Many thanks for the PR. I will review it this week.

orisano commented 1 year ago

benchstat:

goos: linux
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
             │  base.txt   │             xz#55.txt              │
             │   sec/op    │   sec/op    vs base                │
LargeLZMA2-8   10.775 ± 0%   8.704 ± 0%  -19.22% (p=0.000 n=10)
orisano commented 1 year ago

@ulikunitz How can I help?

brancz commented 1 year ago

Just want to add, we just benchmarked this against our workload, and we even see a 25% improvement. Awesome work @orisano!

ulikunitz commented 1 year ago

Hi, please adapt the change for the rewrite branch as I suggested for the other changes.