Closed dcolascione closed 3 years ago
@dcolascione Thanks for your suggestion.
Have to look into/think about it a bit.
@dcolascione's code comment on template parameter to control implementation of normalize_()
.
Changing class ring_span
and ring
as follows:
template
<
class T
, class Popper = default_popper<T>
, bool CapacityIsPowerOf2 = false // added
>
class ring_span
template
<
typename Container
, bool CapacityIsPowerOf2 = false // added
>
class ring
Available in release 0.5.0.
ringspan's normalize() function uses the modulus operator, which emits a division instruction on most architectures. Division is slow: for example, on a Ryzen, division might take 8-13 cycles while add and subtract might take only one or two. Can ringspan gain a template parameter that makes it assume the capacity is power of two and compute normalize with masking instead of modulus?
(Using __builtin_unreachable() doesn't seem sufficient to make the recent GCC or Clang optimize the modulus to masking.)