Closed xytintel closed 6 days ago
With this PR issue reported in https://github.com/pytorch/pytorch/issues/140781 is gone. The HF tests for hubert model pass as follows for me: 156 passed, 257 skipped, 3 warnings in 52.34s
.
I trust you decision that barrier really is not needed. Other than that change works to fix the issue I noticed. Consider to extend test coverage to cover the missed case.
We still need barriers. The reason for the hang is that some threads exit prematurely, preventing the counter from resetting to zero. We are now planning to use named barrier
to solve this problem.
We are now planning to use named barrier to solve this problem.
I was said by sycl folks that named barriers might have performance drawbacks on current generations. Be careful to verify performance.
I verified updated version. It works to address reported issue.
Resolve https://github.com/pytorch/pytorch/issues/140781