lukeiwanski / tensorflow

OpenCL support for TensorFlow via SYCL
Apache License 2.0
65 stars 14 forks source link

Segfault when running debug mnist #251

Closed mirh closed 5 years ago

mirh commented 6 years ago

(ie python -m tensorflow.python.debug.examples.debug_mnist)

#0  0x00007fffb311912d in ?? () from /usr/lib/libamdocl12cl64.so
#1  0x00007fffb2739eb9 in ?? () from /usr/lib/libamdocl12cl64.so
#2  0x00007fffb28b8e1d in ?? () from /usr/lib/libamdocl12cl64.so
#3  0x00007fffb27355be in ?? () from /usr/lib/libamdocl12cl64.so
#4  0x00007fffb310b78f in ?? () from /usr/lib/libamdocl12cl64.so
#5  0x00007fffb310bb83 in ?? () from /usr/lib/libamdocl12cl64.so
#6  0x00007fffb310be3f in ?? () from /usr/lib/libamdocl12cl64.so
#7  0x00007fffb310bf7c in ?? () from /usr/lib/libamdocl12cl64.so
#8  0x00007fffb2221b0a in ?? () from /usr/lib/libamdocl12cl64.so
#9  0x00007fffb2221db0 in ?? () from /usr/lib/libamdocl12cl64.so
#10 0x00007fffb222e60a in ?? () from /usr/lib/libamdocl12cl64.so
#11 0x00007fffb223071c in ?? () from /usr/lib/libamdocl12cl64.so
#12 0x00007fffcab728d9 in aclCompile () from /usr/lib/libamdocl64.so
#13 0x00007fffca2750c5 in ?? () from /usr/lib/libamdocl64.so
#14 0x00007fffca2993dc in ?? () from /usr/lib/libamdocl64.so
#15 0x00007fffca24302f in ?? () from /usr/lib/libamdocl64.so
#16 0x00007fffca253120 in ?? () from /usr/lib/libamdocl64.so
#17 0x00007fffca2340e0 in clBuildProgram () from /usr/lib/libamdocl64.so
#18 0x00007ffff038882f in clBuildProgram () from /usr/lib/libOpenCL.so.1
#19 0x00007fffe347c0d8 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-0.9.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#20 0x00007fffe347c422 in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
   from /opt/ComputeCpp-CE-0.9.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#21 0x00007fffe3468e66 in cl::sycl::detail::context::create_program_for_binary(std::shared_ptr<cl::sycl::detail::context> const&, unsigned char const*, int, bool) () from /opt/ComputeCpp-CE-0.9.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#22 0x00007fffe349319f in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.9.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#23 0x00007fffe8057e7b in cl::sycl::program cl::sycl::program::create_program_for_kernel<Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, Eigen::internal::scalar_constant_op<int>, Eigen::internal::nullary_wrapper<int, Eigen::internal::scalar_constant_op<int>, true, false, false> > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> >(cl::sycl::context) ()
#24 0x00007fffe80575d0 in void cl::sycl::handler::parallel_for_impl<Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, Eigen::internal::scalar_constant_op<int>, Eigen::internal::nullary_wrapper<int, Eigen::internal::scalar_constant_op<int>, true, false, false> > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true>, Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, Eigen::internal::scalar_constant_op<int>, Eigen::internal::nullary_wrapper<int, Eigen::internal::scalar_constant_op<int>, true, false, false> > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> >(cl::sycl::detail::nd_range_base const&, Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, Eigen::internal::scalar_constant_op<int>, Eigen::internal::nullary_wrapper<int, Eigen::internal::scalar_constant_op<int>, true, false, false> > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#25 0x00007fffe80563cc in Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#26 0x00007fffe805618f in cl::sycl::event cl::sycl::detail::command_group::submit_handler<Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}>(Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}, std::shared_ptr<cl::sycl::detail::queue> const&, cl::sycl::detail::standard_handler_tag) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#27 0x00007fffe8055f93 in cl::sycl::event cl::sycl::queue::submit<Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}>(Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#28 0x00007fffe8055df6 in Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<int>, Eigen::TensorMap<Eigen::Tensor<int, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#29 0x00007fffe996c56e in tensorflow::ReductionOp<Eigen::SyclDevice, int, int, Eigen::internal::ProdReducer<int> >::Compute(tensorflow::OpKernelContext*) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#30 0x00007fffe41acbee in tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long) () from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#31 0x00007fffe41ad988 in std::_Function_handler<void (), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(tensorflow::gtl::InlinedVector<tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, 8> const&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::$_1>::_M_invoke(std::_Any_data const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#32 0x00007fffe42060a2 in Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#33 0x00007fffe420599d in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#34 0x00007fffe3001d4f in execute_native_thread_routine () at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
Rbiessy commented 6 years ago

Weird, I cannot reproduce the issue. The kernel that it is trying to compile is really simple so I see no reason why it would fail. Maybe having the full backtrace would help a little bit more. I don't think that would solve it but as a rule of thumb you should make sure to always execute the python scripts from outside the TF repository when you want to use the installed whl. On a side note ComputeCpp 0.9.0 has released today. We haven't tested it much with TF yet but it could solve your issue.

mirh commented 6 years ago

Ehrm.. nothing new under the sun. (and beginning and end of bt are kinda always the same - do you want the extended python information perhaps?)

mirh commented 6 years ago

Well, well, well we have updates here

#18 0x00007fffe733982f in clBuildProgram () from /usr/lib/libOpenCL.so.1
#19 0x00007fffe786ac0f in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-1.0.1-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#20 0x00007fffe786aeb2 in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-1.0.1-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#21 0x00007fffe787e1c2 in cl::sycl::detail::context::create_program_for_binary(std::shared_ptr<cl::sycl::detail::context> const&, unsigned char const*, int, bool) () from /opt/ComputeCpp-CE-1.0.1-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#22 0x00007fffe788504f in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-1.0.1-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#23 0x00007fffee915e2b in cl::sycl::program cl::sycl::program::create_program_for_kernel<tensorflow::functor::FillRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> > >(cl::sycl::context) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#24 0x00007fffee915700 in void cl::sycl::handler::parallel_for_impl<tensorflow::functor::FillRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >, tensorflow::functor::FillPhiloxRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>, true> >(cl::sycl::detail::nd_range_base const&, tensorflow::functor::FillPhiloxRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>, true> const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#25 0x00007fffee9154db in tensorflow::functor::FillPhiloxRandom<Eigen::SyclDevice, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >::operator()(tensorflow::OpKernelContext*, Eigen::SyclDevice const&, tensorflow::random::PhiloxRandom, float*, long long, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#26 0x00007fffee9152cf in cl::sycl::event cl::sycl::detail::command_group::submit_handler<tensorflow::functor::FillPhiloxRandom<Eigen::SyclDevice, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >::operator()(tensorflow::OpKernelContext*, Eigen::SyclDevice const&, tensorflow::random::PhiloxRandom, float*, long long, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>)::{lambda(cl::sycl::handler&)#1}>(tensorflow::functor::FillPhiloxRandom<Eigen::SyclDevice, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >::operator()(tensorflow::OpKernelContext*, Eigen::SyclDevice const&, tensorflow::random::PhiloxRandom, float*, long long, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>)::{lambda(cl::sycl::handler&)#1}, std::shared_ptr<cl::sycl::detail::queue> const&, cl::sycl::detail::standard_handler_tag) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#27 0x00007fffee905513 in cl::sycl::event cl::sycl::queue::submit<tensorflow::functor::FillPhiloxRandom<Eigen::SyclDevice, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >::operator()(tensorflow::OpKernelContext*, Eigen::SyclDevice const&, tensorflow::random::PhiloxRandom, float*, long long, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>)::{lambda(cl::sycl::handler&)#1}>(tensorflow::functor::FillPhiloxRandom<Eigen::SyclDevice, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >::operator()(tensorflow::OpKernelContext*, Eigen::SyclDevice const&, tensorflow::random::PhiloxRandom, float*, long long, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>)::{lambda(cl::sycl::handler&)#1}) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#28 0x00007fffee905352 in tensorflow::functor::FillPhiloxRandom<Eigen::SyclDevice, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >::operator()(tensorflow::OpKernelContext*, Eigen::SyclDevice const&, tensorflow::random::PhiloxRandom, float*, long long, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#29 0x00007fffee90c1db in tensorflow::(anonymous namespace)::PhiloxRandomOp<Eigen::SyclDevice, tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >::Compute(tensorflow::OpKernelContext*)
 () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#30 0x00007fffe8646b91 in tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long) () from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#31 0x00007fffe8647938 in std::_Function_handler<void (), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(tensorflow::gtl::InlinedVector<tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, 8> const&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::$_1>::_M_invoke(std::_Any_data const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#32 0x00007fffe86a2c7c in Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#33 0x00007fffe86a25cd in std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#34 0x00007fffe760d063 in std::execute_native_thread_routine (__p=0x555556730d30)
    at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
#35 0x00007ffff7c00a9d in start_thread () from /usr/lib/libpthread.so.0
#36 0x00007ffff7b30a43 in clone () from /usr/lib/libc.so.6
Rbiessy commented 6 years ago

Thanks for the report. It is weird that once again you experience an issue with building a kernel while I haven't seen that for a long time on the different devices we test. This kind of issue typically happens when some invalid code is compiled on the device but then the issue would appear for all devices. Do you know a previous version that doesn't have this particular issue? We still don't have a good way in TensorFlow to print the log coming from clGetProgramBuildInfo but I'll ping you when we have something!

mirh commented 6 years ago

Mhh nope, I only started to try this pretty late in my venturing... (and I'll have to see about downgrading, given how much I'm already hanging over my hacky downgrade of python)

Anyway, not like I have such those high pretenses for this. It's more of a "if it can run here, then more so others should be able"

Rbiessy commented 6 years ago

Ok don't bother too much downgrading, the chances that it fixes the issue are really low. This old commit using TF 1.6 would probably work: https://github.com/Rbiessy/tensorflow/commit/2142bdbcbe5fa53ffe13411a654d3e854861b68a It is using Eigen for the random distributions which is faster but the properties of the distributions are really bad so we had to revert the changes. If it is a blocking issue for you and you think a not so good random distribution would be acceptable for you then I could make a branch with these changes that uses TF 1.9 (until we have a way to fix the issue).

Edit: When compiling do you see something like this? warning: [Computecpp:CC0034]: Function ... is undefined but referenced on the device and the associated kernels may fail to build or execute at run time [-Wsycl-undef-func]

mirh commented 6 years ago

That would be the same of #249 then I guess.. Anyway no, I don't recall seeing that warning

Rbiessy commented 6 years ago

Yes I think the 2 issues are the same. Ok, interesting.

Rbiessy commented 6 years ago

Hi @mirh, I have some updates on your issues with clBuildProgram failing. There is a way to get more information on the failure with the latest version of eigen_sycl and ComputeCpp 1.0.2 but it also requires a small change in the code. The call to submit here should be wrapped inside a try-catch like so:

try {
  device.sycl_queue().submit( ... );
} catch (const cl::sycl::exception& e) {
  std::cerr << e.what() << std::endl;
}

Would you be able to try that and see what your output is?

mirh commented 6 years ago

Unfortunately my humble laptop is pretty duper fried atm... And I'm not sure whenever (if even) I'll get it working again.

Rbiessy commented 5 years ago

We found the issue, it was caused by a double being used in a function assumed to be for floats. The fix is on my dev branch for now: https://github.com/Rbiessy/tensorflow/commit/419a6947e1aa3b30e54567c9419f5ca1d6a5df27

mirh commented 5 years ago

I had spotted the commit.. I'll get back to testing once my replacement motherboard arrives from HK

mirh commented 5 years ago

Getting the very same stack of #249. Should I still attempt that try-catch?