tensorflow / benchmarks

A benchmark framework for Tensorflow
Apache License 2.0
1.15k stars 632 forks source link

Default MaxPoolingOp only supports NHWC #82

Closed pyotr777 closed 6 years ago

pyotr777 commented 6 years ago

In Docker container. CPU-only. Docker image: tensorflow/tensorflow:latest (65e150502892) Updated TensorFlow with pip install -U tf-nightly to fix issue #80 . In container cloned the benchmarks. Start benchmarks with # python tf_cnn_benchmarks.py --batch_size=32 --model=resnet50 Output:

TensorFlow:  1.5
Model:       resnet50
Mode:        training
SingleSess:  False
Batch size:  32 global
             32 per device
Devices:     ['/gpu:0']
Data format: NCHW
Optimizer:   sgd
Variables:   parameter_server
==========
Generating model
2017-11-06 03:49:44.378230: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Running warm up
2017-11-06 03:49:46.244696: E tensorflow/core/common_runtime/executor.cc:651] Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only supports NHWC.
     [[Node: v/tower_0/cg/mpool0/MaxPool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](v/tower_0/cg/conv0/Relu)]]
Traceback (most recent call last):
  File "tf_cnn_benchmarks.py", line 54, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "tf_cnn_benchmarks.py", line 50, in main
    bench.run()
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 916, in run
    return self._benchmark_cnn()
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1155, in _benchmark_cnn
    fetch_summary)
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 530, in benchmark_one_step
    results = sess.run(fetches, options=run_options, run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC.
     [[Node: v/tower_0/cg/mpool0/MaxPool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](v/tower_0/cg/conv0/Relu)]]

Caused by op u'v/tower_0/cg/mpool0/MaxPool', defined at:
  File "tf_cnn_benchmarks.py", line 54, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "tf_cnn_benchmarks.py", line 50, in main
    bench.run()
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 916, in run
    return self._benchmark_cnn()
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1010, in _benchmark_cnn
    (image_producer_ops, enqueue_ops, fetches) = self._build_model()
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1260, in _build_model
    gpu_compute_stage_ops, gpu_grad_stage_ops)
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1566, in add_forward_pass_and_gradients
    self.model.add_inference(network)
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/models/resnet_model.py", line 210, in add_inference
    cnn.mpool(3, 3, 2, 2)
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py", line 273, in mpool
    d_height, d_width, mode, input_layer, num_channels_in)
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py", line 250, in _pool
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/pooling.py", line 429, in max_pooling2d
    return layer.apply(inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 728, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 618, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/pooling.py", line 273, in call
    data_format=utils.convert_data_format(self.data_format, 4))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1958, in max_pool
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 2806, in _max_pool
    data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3073, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1524, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Default MaxPoolingOp only supports NHWC.
     [[Node: v/tower_0/cg/mpool0/MaxPool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"](v/tower_0/cg/conv0/Relu)]]

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 417, in run
    global_step_val, = self.sess.run([self.global_step_op])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1047, in _run
    raise RuntimeError('Attempted to use a closed Session.')
RuntimeError: Attempted to use a closed Session.
tfboyd commented 6 years ago

Unrelated to if this has been fixed. This is kind of a drive by comment. Sharing facts I believe I know for sure without research into what I do not know.

If you are using CPU-only I suggest building from source and using MKL. Here are some compelling stats. I realized building from source is a lot more work than downloading the binary. We are working on optimized builds. I know MKL supports this because I did the testing. Also you can just flip the data_format to NHWC and this works fine and for CPU is what you want unless using MKL.

You will also want to pass --device=cpu if you are using cpu only, due to soft placement it might just work anyway. There are also some MKL flags you could use if you use MKL those are in the link below as well.

NHWC was the data format of choice for CPU so many ops on CPU are not supported with the NCHW data_format. MKL (added by Intel and works fine on AMD, we tested) prefers NCHW and thus most operaitons are supported especially those related to CNNs as that is where they focused their time.

https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn

reedwm commented 6 years ago

As @tfboyd stated, you must use --data_format=NHWC when running on the CPU. You should also use --device=cpu, as @tfboyd stated.

@zheng-xq, why do we use soft placement? It makes the error message very unclear. We should give a better error message if tf_cnn_benchmarks is run on a machine without a GPU and --device=cpu is not specified.

pyotr777 commented 6 years ago

@tfboyd @reedwm Thank you! --device=cpu --data_format=NHWC works for me.

I would suggest that --device=cpu --data_format=NHWC should be the default when --num_gpus option is 0 or is not there.

tfboyd commented 6 years ago

@reedwm I doubt XQ will answer here. :-) We should ask him though as having that on by default might be more trouble than the value (and I do not know the value for our script).

Al-Badri179 commented 4 years ago

nonetheless of all the aforementioned methods worked with me. I have this error when trying to execute general CNN modal of 4 classes:

Training

hist = model.fit(X_train, y_train, batch_size=16, epochs=num_epoch, verbose=1, validation_data=(X_test, y_test))

The is the displayed error: InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU [[{{node max_pooling2d_2/MaxPool}}]] I need any assistance to exceed this issue.

sruthisentil commented 3 years ago

nonetheless of all the aforementioned methods worked with me. I have this error when trying to execute general CNN modal of 4 classes:

Training

hist = model.fit(X_train, y_train, batch_size=16, epochs=num_epoch, verbose=1, validation_data=(X_test, y_test))

The is the displayed error: InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU [[{{node max_pooling2d_2/MaxPool}}]] I need any assistance to exceed this issue.

Did you ever end up solving this issue?

ShaneYS commented 3 years ago

Unrelated to if this has been fixed. This is kind of a drive by comment. Sharing facts I believe I know for sure without research into what I do not know.

If you are using CPU-only I suggest building from source and using MKL. Here are some compelling stats. I realized building from source is a lot more work than downloading the binary. We are working on optimized builds. I know MKL supports this because I did the testing. Also you can just flip the data_format to NHWC and this works fine and for CPU is what you want unless using MKL.

You will also want to pass --device=cpu if you are using cpu only, due to soft placement it might just work anyway. There are also some MKL flags you could use if you use MKL those are in the link below as well.

NHWC was the data format of choice for CPU so many ops on CPU are not supported with the NCHW data_format. MKL (added by Intel and works fine on AMD, we tested) prefers NCHW and thus most operaitons are supported especially those related to CNNs as that is where they focused their time.

https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn

excuse me, I meet the same error when inference with tensorflow1.10 C++ API, How should I solved this? How to use --device=cpu --data_format=NHWC in C++?

kenil22 commented 1 year ago

nonetheless of all the aforementioned methods worked with me. I have this error when trying to execute general CNN modal of 4 classes:

Training

hist = model.fit(X_train, y_train, batch_size=16, epochs=num_epoch, verbose=1, validation_data=(X_test, y_test))

The is the displayed error: InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU [[{{node max_pooling2d_2/MaxPool}}]] I need any assistance to exceed this issue.

You can change your import to the below mentioned line from keras import backend as K K.set_image_data_format('channels_first ')

to

from keras import backend as K K.set_image_data_format('channels_last ')