Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU

I've run into an error when trying to run a CNN model with a max-pooling operation in my tff experiment.

I have 2 Nvidia GPU's on my machine and my tff library was able to utilize them if I use a DNN model without max-pooling.

As you can see with the log below, My script was able to load the 2 GPUs but I get an error which says I'm using the CPU.

2020-04-15 00:12:16.118122: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-15 00:12:16.118155: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-15 00:12:16.118206: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-15 00:12:16.118225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-15 00:12:16.118254: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-15 00:12:16.118284: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-15 00:12:16.118303: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-15 00:12:16.119743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1
2020-04-15 00:12:16.119789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-15 00:12:16.119811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 1 
2020-04-15 00:12:16.119817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N N 
2020-04-15 00:12:16.119823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1:   N N 
2020-04-15 00:12:16.120866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5707 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:04:00.0, compute capability: 3.5)
2020-04-15 00:12:16.121361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 5707 MB memory) -> physical GPU (device: 1, name: GeForce GTX TITAN Black, pci bus id: 0000:83:00.0, compute capability: 3.5)
2020-04-15 00:12:18.113550: E tensorflow/core/common_runtime/executor.cc:654] Executor failed to create kernel. Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
     [[{{node StatefulPartitionedCall/StatefulPartitionedCall/sequential/max_pooling1d/MaxPool}}]]
2020-04-15 00:12:18.120562: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at iterator_ops.cc:611 : Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
     [[{{node StatefulPartitionedCall/StatefulPartitionedCall/sequential/max_pooling1d/MaxPool}}]]
Traceback (most recent call last):
  File "/home/getalp/eks/env/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1367, in _do_call
    return fn(*args)
  File "/home/getalp/eks/env/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1352, in _run_fn
    target_list, run_metadata)
  File "/home/getalp/eks/env/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1445, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
     [[{{node StatefulPartitionedCall/StatefulPartitionedCall/sequential/max_pooling1d/MaxPool}}]]
     [[subcomputation/StatefulPartitionedCall_1/ReduceDataset]]
     [[subcomputation/StatefulPartitionedCall_1/ReduceDataset/_42]]
  (1) Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
     [[{{node StatefulPartitionedCall/StatefulPartitionedCall/sequential/max_pooling1d/MaxPool}}]]
     [[subcomputation/StatefulPartitionedCall_1/ReduceDataset]]
0 successful operations.
0 derived errors ignored.

The model I used is

def create_keras_model():
    return tf.keras.models.Sequential([
        tf.keras.layers.Input(shape=(segment_size, num_input_channels)),
        tf.keras.layers.Conv1D(196,  16, activation='relu'),
        tf.keras.layers.MaxPool1D(4),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=1024, activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.Dense(activityCount, activation='softmax'),
    ])

tensorflow / benchmarks

Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU #465