Previously, due to using a worker thread, Parthenon would drop any exceptions from Tasks executed in a TaskList. Exceptions would crash the thread, but execution would continue as if it had finished normally.
This PR checks the futures associated with each task, to make sure there are not exceptions. As a bonus, we can add a potentially non-fatal failure return TaskStatus::fail, which will be propagated up to the Driver.
This has in the past failed the sparse_advection tests, but that may have been due to incidental changes over the course of debugging and testing. It seems to pass all tests reliably now, 3-4 times in a row (except a couple of gmg tests which it also fails on my machine even with threading disabled). Should there turn out to still be a rare race condition, I've extended ThreadVector such that vectors of Tasks or TaskLists could be swapped over to be thread-safe transparently.
PR Checklist
[x] Code passes cpplint
[x] New features are documented.
[ ] Adds a test for any bugs fixed. Adds tests for new features.
[x] Code is formatted
[x] Changes are summarized in CHANGELOG.md
[ ] Change is breaking (API, behavior, ...)
[ ] Change is additionally added to CHANGELOG.md in the breaking section
[ ] PR is marked as breaking
[ ] Short summary API changes at the top of the PR (plus optionally with an automated update/fix script)
[ ] CI has been triggered on Darwin for performance regression tests.
[x] Docs build
[x] (@lanl.gov employees) Update copyright on changed files
Previously, due to using a worker thread, Parthenon would drop any exceptions from Tasks executed in a TaskList. Exceptions would crash the thread, but execution would continue as if it had finished normally.
This PR checks the futures associated with each task, to make sure there are not exceptions. As a bonus, we can add a potentially non-fatal failure return TaskStatus::fail, which will be propagated up to the Driver.
This has in the past failed the
sparse_advection
tests, but that may have been due to incidental changes over the course of debugging and testing. It seems to pass all tests reliably now, 3-4 times in a row (except a couple of gmg tests which it also fails on my machine even with threading disabled). Should there turn out to still be a rare race condition, I've extendedThreadVector
such that vectors ofTask
s orTaskList
s could be swapped over to be thread-safe transparently.PR Checklist