hughperkins / tf-coriander

OpenCL 1.2 implementation for Tensorflow
Apache License 2.0
791 stars 90 forks source link

On Mac, training operation broken caused seg fault, using Sierra/Radeon #32

Closed hughperkins closed 7 years ago

hughperkins commented 7 years ago

On Mac, training operation broken caused seg fault, using Sierra/Radeon.

ie, forward direction on a linear regression works ok:

'''
A linear regression learning algorithm example using TensorFlow library.

Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''

from __future__ import print_function

import tensorflow as tf
import numpy
import matplotlib.pyplot as plt
rng = numpy.random

# Parameters
learning_rate = 0.01
training_epochs = 1000
training_epochs = 5
display_step = 50

with tf.device('/gpu:0'):
    # Training Data
    train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                             7.042,10.791,5.313,7.997,5.654,9.27,3.1])
    train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                             2.827,3.465,1.65,2.904,2.42,2.94,1.3])
    n_samples = train_X.shape[0]

    # tf Graph Input
    X = tf.placeholder("float")
    Y = tf.placeholder("float")

    # Set model weights
    W = tf.Variable(rng.randn(), name="weight")
    b = tf.Variable(rng.randn(), name="bias")

    # Construct a linear model
    pred = tf.add(tf.mul(X, W), b)

    # Mean squared error
    cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
    # Gradient descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

    # Initializing the variables
    init = tf.initialize_all_variables()

    # Launch the graph
    with tf.Session() as sess:
        sess.run(init)

        # Fit all training data
        for epoch in range(training_epochs):
            batch_num = 0
            for (x, y) in zip(train_X, train_Y):
                x = sess.run(X, feed_dict={X: x})
                if batch_num == 0:
                    print('epoch %s' % epoch)
                    X_val, Y_val, W_val, b_val = sess.run((X, Y, W, b), feed_dict={X: x, Y: y})
                    print(X_val, Y_val, W_val, b_val)
                    print('pred', sess.run(pred, feed_dict={X: x, Y: y}))
                    print('cost', sess.run(cost, feed_dict={X: x, Y: y}))
                batch_num += 1

... but adding the optimizer operation causes segfault:

        # Fit all training data
        for epoch in range(training_epochs):
            batch_num = 0
            for (x, y) in zip(train_X, train_Y):
                x = sess.run(X, feed_dict={X: x})
                if batch_num == 0:
                    print('epoch %s' % epoch)
                    X_val, Y_val, W_val, b_val = sess.run((X, Y, W, b), feed_dict={X: x, Y: y})
                    print(X_val, Y_val, W_val, b_val)
                    print('pred', sess.run(pred, feed_dict={X: x, Y: y}))
                    print('cost', sess.run(cost, feed_dict={X: x, Y: y}))
                sess.run(optimizer, feed_dict={X: x, Y: y})
                batch_num += 1
F name _ZN5Eigen8internal15EigenMetaKe
 running generation on _ZN5Eigen8internal15EigenMetaKe
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_20TensorCwiseNullaryOpINS0_15scalar_const_opIfEEKS8_EEEENS_9GpuDeviceEEEiEEvT_T0_
 ... built
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_19TensorCwiseBinaryOpINS0_20scalar_difference_opIffEEKS8_KNS9_INS0_17scalar_product_opIKfSE_EEKNS_20TensorBroadcastingOpIKNS_5arrayIiLm1EEEKNS_17TensorReshapingOpIKNS_5SizesIJLl1EEEEKNS4_INS_15TensorFixedSizeISE_NSL_IJEEELi1EiS7_EELi16ES7_EEEEEEKNS4_INS5_ISE_Li1ELi1EiEELi16ES7_EEEEEEEENS_9GpuDeviceEEEiEEvT_T0_
F name _ZN5Eigen8internal15EigenMetaKe
 running generation on _ZN5Eigen8internal15EigenMetaKe
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_19TensorCwiseBinaryOpINS0_20scalar_difference_opIffEEKS8_KNS9_INS0_17scalar_product_opIKfSE_EEKNS_20TensorBroadcastingOpIKNS_5arrayIiLm1EEEKNS_17TensorReshapingOpIKNS_5SizesIJLl1EEEEKNS4_INS_15TensorFixedSizeISE_NSL_IJEEELi1EiS7_EELi16ES7_EEEEEEKNS4_INS5_ISE_Li1ELi1EiEELi16ES7_EEEEEEEENS_9GpuDeviceEEEiEEvT_T0_
 ... built
Segmentation fault: 11

I'm taking a look at this issue.

Edit: seems something to do with event handling:

* thread #13, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff0)
  * frame #0: 0x00007fffe6908a59 libc++abi.dylib`__dynamic_cast + 38
    frame #1: 0x00007fffd6966baa OpenCL`___lldb_unnamed_symbol306$$OpenCL + 37
    frame #2: 0x00007fffd6978ef3 OpenCL`clReleaseEvent + 15
    frame #3: 0x000000011ac46667 libcocl.dylib`::cuEventRecord(event=0x000000011b7d7ad0, _queue=<unavailable>) at cocl_events.cpp:92 [opt]
    frame #4: 0x00000001091dac75 _pywrap_tensorflow.so`perftools::gputools::cl::CLDriver::RecordEvent(context=0x000000012120db20, event=0x000000011b7d7ad0, stream="0W\x86\x1b\x01") at cl_driver.cc:1121
    frame #5: 0x00000001091eae95 _pywrap_tensorflow.so`perftools::gputools::cl::CLExecutor::CreateStreamDependency(this=0x000000011e55baf0, dependent=0x000000011b7de0a0, other=0x000000011b7a4f00) at cl_gpu_executor.cc:730
    frame #6: 0x0000000109272b64 _pywrap_tensorflow.so`perftools::gputools::StreamExecutor::CreateStreamDependency(this=0x000000011e55b190, dependent=0x000000011b7de0a0, other=0x000000011b7a4f00) at stream_executor_pimpl.cc:635
    frame #7: 0x0000000109246bc8 _pywrap_tensorflow.so`perftools::gputools::Stream::ThenWaitFor(this=0x000000011b7de0a0, other=0x000000011b7a4f00) at stream.cc:1335
    frame #8: 0x00000001091a236f _pywrap_tensorflow.so`tensorflow::GPUUtil::CopyCPUTensorToGPU(cpu_tensor=0x000000011b946e78, device_context=0x000000011e58b130, gpu_device=0x000000011b7a5ea0, gpu_tensor=0x000000011d9cc850, done=0x000000010060cf70)>) at gpu_util.cc:326
    frame #9: 0x00000001091ac613 _pywrap_tensorflow.so`tensorflow::GPUDeviceContext::CopyCPUTensorToDevice(this=0x000000011e58b130, cpu_tensor=0x000000011b946e78, device=0x000000011b7a5ea0, device_tensor=0x000000011d9cc850, done=<unavailable>)>) const at gpu_util_platform_specific.cc:29
    frame #10: 0x00000001096f8fda _pywrap_tensorflow.so`tensorflow::CopyTensor::ViaDMA(edge_name=(data_ = "edge_185__recv_Placeholder_0;0:0", size_ = 28), send_dev_context=0x0000000000000000, recv_dev_context=0x000000011e58b130, src=0x000000011b7ac770, dst=0x000000011b7a5ea0, src_alloc_attr=(value = 4), dst_alloc_attr=(value = 0), input=0x000000011b946e78, output=0x000000011d9cc850, done=0x000000010060bc50)>) at copy_tensor.cc:99
    frame #11: 0x00000001097a5cc9 _pywrap_tensorflow.so`tensorflow::IntraProcessRendezvous::SameWorkerRecvDone(this=0x000000011b946140, parsed=0x0000000101b19968, send_args=0x00007000023bba50, recv_args=0x00007000023baf30, in=0x000000011b946e78, out=0x000000011d9cc850, done=0x000000013d006800)>) at rendezvous_mgr.cc:106
hughperkins commented 7 years ago

fixed on branch (no wheel yet)