Open ye-luo opened 1 year ago
@llvm/issue-subscribers-openmp
For tasks that are not target nowait, why would we not execute them immediately in a sequential environment?
target nowait task can be viewed as a special case of a detached task. If such tasks are executed immediately and wait for its completion, the only thread gets blocked and no further tasks can be created. If all the tasks can be created before any task gets scheduled, when a detached task gets executed but not fulfilled, other tasks can still be run.
what I'm trying to say is that we probably are fine with the behavior of the reproducer but we need to change how we model target tasks.
It made sense when OpenMP is only used for CPU. However, I would say keeping 1 thread case special is unhealthy and error-prone. Based on my experience writing all kinds of parallel codes, it is not worth keeping a special code path if it can be well folded into a generic case. I don't see good reasons that OpenMP runtime should be an exception. Why making target task special? To me it is a worse option than making task async regardless of the number of threads.
To me it is a worse option than making task async regardless of the number of threads.
Arguably, that is inherently costly. That said, I am unsure if it matters in practice, honestly I'd be surprised. Long story short, I agree with your point.
@TerryLWilmarth, wdyt?
This is a libomp implementation issue not an OpenMP spec one.
First of all, I'd not call it an issue. It is implementation choice. All current behaviors conform with OpenMP spec.
target nowait task can be viewed as a special case of a detached task. If such tasks are executed immediately and wait for its completion, the only thread gets blocked and no further tasks can be created.
This is not right. The completion of detachable tasks will only affect its dependent tasks. The execution of the encountering thread will proceed after the body of the detachable tasks are finished. If you were referring to an explicit task depending on a detachable task, then that is true.
As for those explicit tasks created in either implicit or inactive parallel region, to execute them immediately or to execute them at the end after the body of the parallel region is finished, I'd say that would be a debate. Either side can easily come up with cases that perform better than the other.
For target tasks (and what you concern), we indeed need to remodel it. That would be the request to OpenMP language committee.
For target tasks (and what you concern), we indeed need to remodel it. That would be the request to OpenMP language committee.
The spec doesn't dictate one way or another. It is more of an implementation choice. So I don't see any spec change needed.
This is a libomp implementation issue not an OpenMP spec one. The async tasking behavior should be respected even when there is no parallel region and there is only a single implicit thread. I wrote this reproducer as a proxy for the issue of target tasks which should be asynchronous but not due to the limitation of the libomp behavior.
code