mpicbg-scicomp / gearshifft

Benchmark Suite for Heterogenuous FFT Implementations
Apache License 2.0
34 stars 9 forks source link

Graceful exit if have a memory error when using OpenCL #123

Open bkmgit opened 6 years ago

bkmgit commented 6 years ago

When using beigenet and clFFT, if one includes a large transform size, then execution can terminate abruptly:

/home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/33554432/Inplace_Real": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/33554432/Outplace_Real": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/67108864/Inplace_Real": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/67108864/Outplace_Real": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/134217728/Inplace_Real": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/134217728/Outplace_Real": Unsupported lengths. drm_intel_gem_bo_context_exec() failed: Cannot allocate memory Beignet: "Exec event 0x5794ce20 error, type is 4592, error status is -5" /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/134217728/Outplace_Complex": mismatches=0 deviation=0.522913 errorbound=1e-05 /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/268435456/Inplace_Real": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/268435456/Inplace_Complex": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/268435456/Outplace_Real": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/268435456/Outplace_Complex": Unsupported lengths. /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/536870912/Inplace_Real": Unsupported lengths. unknown location(0): fatal error: in "ClFFT/float/536870912/Inplace_Complex": boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >: std::bad_alloc /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/536870912/Outplace_Real": Unsupported lengths. unknown location(0): fatal error: in "ClFFT/float/536870912/Outplace_Complex": boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >: std::bad_alloc unknown location(0): fatal error: in "ClFFT/float/1073741824/Inplace_Real": boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >: std::bad_alloc unknown location(0): fatal error: in "ClFFT/float/1073741824/Inplace_Complex": boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >: std::bad_alloc unknown location(0): fatal error: in "ClFFT/float/1073741824/Outplace_Real": boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >: std::bad_alloc unknown location(0): fatal error: in "ClFFT/float/1073741824/Outplace_Complex": boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >: std::bad_alloc /home/benson/projects/gearshifft/gearshifft/src/../inc/core/benchmark_executor.hpp(90): fatal error: in "ClFFT/float/43046721/Inplace_Real": Unsupported lengths. Killed

psteinb commented 6 years ago

We can try to catch an exception, but this looks like beignet doesn't emit one. If that is correct, all we can do is limit the transform signal size by means of the config file.

tdd11235813 commented 6 years ago

I also had such errors due to out-of-memory kills. Maybe it is a bug in gearshifft, although I checked the memory leaks (valgrind has shown some possible leaks in OpenCL and clFFT (reported) though). Other back-ends do not fail. So it is also possible that this is probably due to clfft's implementation, when we see this error in different OpenCL implementations. I could not catch it, as the operating system seems to just kill the process instead of informing it.

The large sizes are not supported by clfft, thus disabled in the clFFT back-end. However, the CPU buffer are allocated nonetheless. I look for a way to get the property before allocating the buffers. For now the only way is to skip the large sizes and if out-of-memory kills occur after long benchmarks, please report this here as well.