Closed lukeiwanski closed 7 years ago
This only happens very occasionally, around 5 out of 50 times the test is run.
Add --runs_per_test=50
to bazel test to reproduce. The test is super quick to run, so running it this many times is not a problem.
Looks to be a seg fault inside the GSYCLInterface
destructor, when destroying the .Eigen::QueueInterface
objects
I can't get the crash to reproduce when the tests are run serially, only when there are at least 3 tests running at once.
Seg fault comes from the tensorflow::Buffer
destructor. A tensorflow::Buffer
is a wrapper around an array and a pointer to the allocator, essentially
struct Buffer {
T* data;
Alloc* alloc;
}
And the destructor uses the allocator to call alloc->deallocate(data)
. The problem here is that there are Tensors which are left on the SYCL device and the underlying buffer is only deleted at program exit. However there is then a race condition between deleting the buffer and deleting the SYCL allocators stored in GSYCLInterface
. When the allocator is deleted first, the buffer will cause a segfault when it tries to deallocate its array.
Fixed by #136
Closing
System Info
ComputeCpp 0.2.0
To reproduce
Error