habanero-rice / hclib

A C/C++ task-based programming model for shared memory and distributed parallel computing.
http://habanero-rice.github.io/hclib/
BSD 3-Clause "New" or "Revised" License
71 stars 35 forks source link

Fix issue #73: Deadlock when using finish and async_await #77

Closed sbak5 closed 5 years ago

sbak5 commented 5 years ago

This fixes the issue #73 considering that root finish can run in core_work_loop so all the workers cannot get out of the loop by 'hclib_signal_join'

agrippa commented 5 years ago

@sbak5 thanks for taking a look at this -- I'm sure this has been a tough one to think through.

Before opening pull requests, it's always a good idea to run all of the c and cpp tests we have to verify no regressions have been introduced. I just tried that, and it looks like there's now a failing assertion in cpp/finish0. There may be other failing tests, I only took a quick look. I believe the c/memory/allocate test is now hanging as well (see the Travis builds, which hit that error). We'll need those issues resolved before merging (as well as any other failing tests), as both of those tests pass on the master branch.

You should be able to simply use the test_all.sh scripts under the test/c/ and test/cpp/ directories to run all of the tests.

Thanks!

Max

sbak5 commented 5 years ago

Hello, Max, Yeah, my patch doesn't consider the example where future is used. I'll push a patch to resolve the case to resolve the issue.

Thanks, Seonmyeong Bak

sbak5 commented 5 years ago

I've pushed a fix to resolve the build errors.

The main issue was that the termination of runtime was not synchronized.

I added a task which is dependent on the root task so that the task signals other workers the root task is completed (now the counter value of the root finish is final). After this signal, we can use the counter value of the root finish as our termination condition.

agrippa commented 5 years ago

@sbak5 looks good, all the tests are passing on my laptop as well.

one final step: could you add Sriraj's original deadlock example as a test in the appropriate directory (I can't remember if it was c or cpp)? That way, we prevent against future regressions.

Thanks for working on this!

sbak5 commented 5 years ago

I just added. The name of the test is nested_finish_async_await.cpp in test/cpp

agrippa commented 5 years ago

@sbak5 looks like the build failed/hung? could you try rerunning the Travis tests?

https://travis-ci.org/habanero-rice/hclib/jobs/545539336

sbak5 commented 5 years ago

How can I rerun the travis test? Do I need to recommit?

sbak5 commented 5 years ago

@agrippa The build of this commit is OK this time.