[libc++] Code generation for bounds checks seem unoptimal

ldionne commented 5 months ago

In std::vector, we currently do

template <class _Tp, class _Allocator>
reference vector<_Tp, _Allocator>::operator[](size_type __n) _NOEXCEPT {
  _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__n < size(), "vector[] index out of bounds");
  return this->__begin_[__n];
}

However, this requires computing end - begin and comparing it to __n. Since we already need to compute __begin + __n (since we do __begin[__n]), we could probably express the condition differently to improve code generation:

template <class _Tp, class _Allocator>
reference vector<_Tp, _Allocator>::operator[](size_type __n) _NOEXCEPT {
  auto* __p = vec.__begin_ + __n;
  _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__p < vec.__end_, "vector[] index out of bounds");
  return *__p;
}

This seems to generate better code indeed, see https://godbolt.org/z/dzPvMcdfE.

However, I wonder if there's some kind of tradeoff where this would be less easily optimized by the compiler. I just don't have a good intuition for which one is going to be better in the general case.

CC @fhahn @var-const

ldionne commented 5 months ago

The comparison of __begin + __n < __end is technically UB if OOB, but we could probably work around that.

mordante commented 5 months ago

The comparison of __begin + __n < __end is technically UB if OOB, but we could probably work around that.

I think this is a real issue, even when the compiler doesn't try to do "smart" things regarding UB. When __n is large the pointer may wrap and effectively the following condition becomes true __begin + __n < __begin this means __begin + __n < __end is also true and effectively memory at __begin - __x will be accessed.

ldionne commented 2 months ago

^ We can actually use std::less or the tricks we use in __is_pointer_in_range to make the comparison not be UB. For example, reinterpret_cast<uintptr_t> or something along those lines. So I don't think that comparison being UB is a blocker here.

llvm / llvm-project

[libc++] Code generation for bounds checks seem unoptimal #90785