Simple benchmark in CPU mode runs order of magnitude longer than in normal TF on Mac

Hi, first I want to thank for this great project, awesome work. Now, to my problem. I use simple script to compare performance of my machines in TF (originally from here: http://bailiwick.io/2017/11/05/tensorflow-gpu-windows-and-jupyter/). It is maybe not the best fit, but it gives some consistent results. Script is as follows:

import sys
import numpy as np
import tensorflow as tf
from datetime import datetime

### argv[1] = type of device and which one
### argv[2] = size of the matrix to operate on

device_name = sys.argv[1]
shape = (int(sys.argv[2]), int(sys.argv[2]))
if device_name == "gpu":
    device_name = "/gpu:0"
else:
    device_name = "/cpu:0"

with tf.device(device_name):
    random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)

startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
        result = session.run(sum_operation)
        print(result)

### Print the shape, device name and timing
print("\n" * 3)
print("Shape:", shape, "Device:", device_name)
print("Time taken:", datetime.now() - startTime)
print("\n" * 3)

When I run it on my Mac (late 2016 MB Pro with touchpad, 15") in classic CPU mode in my custom built TensorFlow (version 1.5.0 + CPU optimisations), when passign these parameters: python3 tensorflow_test.py cpu 10000

i get this time: Time taken: 0:00:07.430271 so, around 7 seconds in average

when I run tf-coriander TF supplied from this url: https://github.com/hughperkins/tf-coriander/releases/download/v0.18.3/tensorflow-cl-v0.18.3-macsierra-python3.zip

with same parameters, script runs for tens of seconds without yielding any results, taking all 8 logic cores to almost 100 %. After that, I have stopped the script by pressing ^C, so I do not know the exact runtime.

Is this some internal bug or is it caused by the fact that tf-coriander is maybe based on different vanilla TF than my custom build, which maybe has some new optimizations?

Thanks for response, Tom

hughperkins / tf-coriander

Simple benchmark in CPU mode runs order of magnitude longer than in normal TF on Mac #81