pion / srtp

A Go implementation of SRTP
MIT License
119 stars 49 forks source link

Optimise generateCounter (AES-CM-HMAC-SHA1) #201

Closed adriancable closed 2 years ago

adriancable commented 2 years ago

Before

goos: darwin
goarch: amd64
pkg: github.com/pion/srtp/v2
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkGenerateCounter
BenchmarkGenerateCounter-12         97850202            12.26 ns/op

After

BenchmarkGenerateCounter
BenchmarkGenerateCounter-12         175902676            6.383 ns/op

This updates a proposed code change from @kixelated back in 2019.

codecov[bot] commented 2 years ago

Codecov Report

Merging #201 (41b63af) into master (b74938d) will increase coverage by 0.08%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #201      +/-   ##
==========================================
+ Coverage   75.16%   75.24%   +0.08%     
==========================================
  Files          17       17              
  Lines        1224     1228       +4     
==========================================
+ Hits          920      924       +4     
  Misses        208      208              
  Partials       96       96              
Flag Coverage Δ
go 75.24% <100.00%> (+0.08%) :arrow_up:
wasm 74.75% <100.00%> (+0.08%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
key_derivation.go 92.68% <100.00%> (+0.79%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b74938d...41b63af. Read the comment docs.

kixelated commented 2 years ago

ez

kixelated commented 2 years ago

I don't remember why this is faster, and it could probably be optimized further, but thanks for pushing this through.

adriancable commented 2 years ago

On big endian systems you can do a couple of unsafe pointer casts to uint32 in the middle of the counter array instead. This would reduce op count by a further 2.5X. But at this point you are nibbling at the edges because the CPU footprint of this function is already very small now.