C++ tests for promise+scalar type fail with the -O1, O2, O3 options

ahayashi commented 6 years ago

Hello,

I've noticed that C++ test programs for promise+scalar type fail when optimization options are specified.

The problem

Consider the following program (test/cpp/future1.cpp):

  constexpr int SIGNAL_VALUE = 42
  hclib::promise_t<int> *event = new hclib::promise_t<int>();
  hclib::async([=]() {
   // T1
    int signal = event->get_future()->wait();
    assert(signal == SIGNAL_VALUE);
    printf("signal = %d\n", signal);
  });
  hclib::async([=]() {
    // T2
    sleep(1);
    event->put(SIGNAL_VALUE);
  });

Task T1 waits on a promise with scalar type (promise_t<int>) which is put by Task T2 (event->put(SIGNAL_VALUE)) and we expect that Task T1 receives 42. However, if this program is compiled with optimization options (-O1, -O2, -O3), the assertion fails (or a wrong value is printed).

Steps to reproduce the problem

I confirmed I can reproduce the problem on my laptop (Apple LLVM 8.0.0) and davinci (g++ 6.2.0). Also, It's worth noting that HClib itself is built with the -O3 option (by install.sh). The problem is more about a compiler option you give when compiling test programs.

$ git clone git@github.com:habanero-rice/hclib.git
$ ./install.sh
$ source ./hclib-install/bin/hclib_setup_env.sh
$ cd test/cpp
$ emacs Makefile # add the -O3 option
$ ./test_all.sh # or make -j
$ ./future1

The Cause

The root cause is the current implementation of future_t::wait() (get() has the same issue). Here is the implementation (hclib-future.hpp):

 T &&wait() {
  _ValUnion tmp;
  tmp.vp = hclib_future_wait(this);
  return std::move(tmp.val);
}

The problem is , since the variable tmp is allocated on stack, the value can be easily lost at the end of wait even though the returned value is a rvalue reference.

Possbile Solutions

Here are possible solutions. I'd vote for 1 because 2 would prevent compilation optimizations. However, If returning a rvalue reference is important and/or there are any other solutions, please let me know. Also, if the performance is the case (e.g., A scenario where T is not a primitive type and the overhead of copy constructor would not be small), I'd be happy to try to come up with a better solution and/or create synthetic benchmarks to discuss the performance.

1. Return by value

 T wait() {
  _ValUnion tmp;
  tmp.vp = hclib_future_wait(this);
  return tmp.val;
}

2. The use of `volatile`

volatile T&& wait() {
  volatile _ValUnion tmp;
   tmp.vp = hclib_future_wait(this);
   return std::move(tmp.val);
}

Thanks,

Akihiro

agrippa commented 6 years ago

@ahayashi Thanks for all the debugging! I'm fairly confident this was a pain for you to figure out, but really appreciate it. I think we're just running into my own lack of expertise on some of the subtler features on C++ - I think this code was just copied over from some changes that Nick made in the past. I agree that option #1 is the simplest and best approach. Do you want to make a pull request on my branch, or would you like me to?

ahayashi commented 6 years ago

@agrippa Thanks! Yea, that was hard, but I'm very glad that I finally figured it out :) Yes, I'd be happy to make a pull request. I'd like to make sure which branch I should make it against. Your branch means resource_workers branch, right?

ahayashi commented 6 years ago

@agrippa, just created a pull request for this. Please let me know if you there is any problem.

habanero-rice / hclib