hughperkins / tf-coriander

OpenCL 1.2 implementation for Tensorflow
Apache License 2.0
791 stars 90 forks source link

Simple benchmark in CPU mode runs order of magnitude longer than in normal TF on Mac #81

Open MyOwnClone opened 6 years ago

MyOwnClone commented 6 years ago

Hi, first I want to thank for this great project, awesome work. Now, to my problem. I use simple script to compare performance of my machines in TF (originally from here: http://bailiwick.io/2017/11/05/tensorflow-gpu-windows-and-jupyter/). It is maybe not the best fit, but it gives some consistent results. Script is as follows:

import sys
import numpy as np
import tensorflow as tf
from datetime import datetime

### argv[1] = type of device and which one
### argv[2] = size of the matrix to operate on

device_name = sys.argv[1]
shape = (int(sys.argv[2]), int(sys.argv[2]))
if device_name == "gpu":
    device_name = "/gpu:0"
else:
    device_name = "/cpu:0"

with tf.device(device_name):
    random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)

startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
        result = session.run(sum_operation)
        print(result)

### Print the shape, device name and timing
print("\n" * 3)
print("Shape:", shape, "Device:", device_name)
print("Time taken:", datetime.now() - startTime)
print("\n" * 3)

When I run it on my Mac (late 2016 MB Pro with touchpad, 15") in classic CPU mode in my custom built TensorFlow (version 1.5.0 + CPU optimisations), when passign these parameters: python3 tensorflow_test.py cpu 10000

i get this time: Time taken: 0:00:07.430271 so, around 7 seconds in average

when I run tf-coriander TF supplied from this url: https://github.com/hughperkins/tf-coriander/releases/download/v0.18.3/tensorflow-cl-v0.18.3-macsierra-python3.zip

with same parameters, script runs for tens of seconds without yielding any results, taking all 8 logic cores to almost 100 %. After that, I have stopped the script by pressing ^C, so I do not know the exact runtime.

Is this some internal bug or is it caused by the fact that tf-coriander is maybe based on different vanilla TF than my custom build, which maybe has some new optimizations?

Thanks for response, Tom