pion / stun

A Go implementation of STUN
https://pion.ly/
MIT License
601 stars 91 forks source link

Speed up XORMappedAddress by inlining xorBytes #117

Closed greatroar closed 1 year ago

greatroar commented 2 years ago

Description

Since xorBytes is only ever called for very small slices, the expensive part of it is the function call. Benchmark results on linux/amd64:

name                        old time/op    new time/op    delta
XORMappedAddress_AddTo-8      43.5ns ±12%    33.1ns ± 1%  -23.91%  (p=0.000 n=10+9)
XORMappedAddress_GetFrom-8    26.2ns ± 7%    19.6ns ±20%  -25.36%  (p=0.000 n=10+9)

Reference issue

Closes #116.

jech commented 2 years ago

See also #116, which does the exact opposite ;-) The logic being that this is not going to be visible in macro benchmarks, so we might as well centralise all of our XOR implementations in a single place.

greatroar commented 1 year ago

I saw that, but then the question is whether you want all users to pull in a dependency with five asm implementations that are actually slower than a simple loop.

stv0g commented 1 year ago

This looks like a prime example of premature optimization to me.

I support @jech proposal to centralize the implementation in the transport package.

jech commented 1 year ago

I really don't care either way ­— either this PR or #116 are perfectly fine with me. On the other hand, I'd like to see one or the other applied, since I'm uncomfortable with the code duplication that's happening in the current code.

stv0g commented 1 year ago

I've reviewed and merged #116. Hence this PR is obsolete.