mcy / best

The Best Library: a C++ STL replacement
Apache License 2.0
158 stars 1 forks source link

`best::tap::operator[]` index mismatch #16

Open fennewald opened 2 months ago

fennewald commented 2 months ago

There are two issues with best::tap::operator[] -- a typo that prevents use, and a lifetime issue.

Here is the current implementation, for reference.

https://github.com/mcy/best/blob/347525532d6cb3468ad3ac39866e6ebbe6c91114/best/func/tap.h#L178-L201

First

There is a name collision for arg -- the lambda argument shadows the function argument. This prevents any actual use. The solution is straightforward: change one of the names, add a few tests to prevent any future issues.

Second

We run into some weird lifetime issues here. In general, operator[] returns reference types. The decltype(auto) signature preserves the reference. This means that code like this:

best::test Tap = [](auto& t) {
  // Generates a vector of multiples of n, to the requested length
  best::tap multiples = [](int&& n){
    return [&](int len){
      best::vec<int> res;
      for (int i=1; i<len; ++i) {
        res.push(n * i);
      }
      return res;
    };
  };

  // generates the first 10 multiples, and then selects the second
  t.expect_eq((2->*multiples(10))[1], 4); // succeeds
  t.expect_eq(2->*multiples(10)[1], 4); // fails
};

Will return a dangling reference to the objects on the stack of the lambda we just left:

 TEST: _ZN4best8tap_test3TapE ]
failed expect_eq() at best/func/tap_test.cc:45
expected these values to be equal:
  1696621669
  4

There are scenarios where this behavior is valid. If the tap produces, say, a reference to a static object, than references into that container are presumably also static, and valid. However, I believe the average case is a stack-scoped return from.

The documentation alludes to oddities with this behavior, but stops short of fully describing the behavior:

->* also binds stronger than operator() and operator[]. However, the vanilla tap, best::tap, overloads operator() and operator[] to give the illusion otherwise: foo->*my_tap(bar) will be parsed as operator->*(foo, my_tap.operator()(bar)): using () or [] on a tap will "bind" those arguments, returning a new tap that "does what you expect", such that foo->*my_tap(bar) is almost (foo->*my_tap)(bar).

As far as resolution, I am happy to submit a PR, and have two proposed resolutions:

Option 1: Leave it alone

In short, just fix the argument naming, and add a sentence in the documentation for best::tap explaining the behavior outlined above. This maintains maximal flexibility for the user, even if it is a bit of a foot-gun.

Option 2: Prefer return-by-value

We can also just update the return types of the function to return a best::unref<decltype(/* the entire function body */)>. Returning-by-value, I believe, will present a more ergonomic behavior for library users. It does, however, introduce 'hidden' costs(an implicit copy constructor call), and prevent more advanced users for intentionally returning references.

I will submit a PR for whichever option you prefer, @mcy.

fennewald commented 2 months ago

As an aside, replacing best::vec with a std::vector in the example above will make clang produce a warning about exactly this issue:

In file included from best/func/tap_test.cc:20:
./best/func/tap.h:199:12: warning: returning reference to local temporary object [-Wreturn-stack-address]
  199 |     return ((BEST_FWD(arg))->*BEST_MOVE(*this))[BEST_FWD(idx)];
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I don't understand why this lint is unable to notice the bug when best::vec is used. Maybe something about SSO?

mcy commented 2 months ago

Interesting. That is indeed a problem. I think this is enough of a footgun that we should either remove operator[] or require that it only be usable when (BEST_FWD(arg)->BEST_MOVE(*this)) is a reference, or indexing returns a trivially copyable value. If the former, returning a reference is probably fine. If the former does not hold but the latter does, we can copy.

The reason best::vec doesn't have this problem is... honestly, I have no idea. My bet is that because it goes through span(), it doesn't inline hard enough in the frontend to see what's going on. It may be worth marking best::span with [[clang::lifetimebound]] in the right places.