No float16 support for _FusedConv2D

leandro-gracia-gil commented 1 year ago

When trying to use the tensorflow directml plugin with a model that works fine with tensorflow-gpu, I'm getting this error.

2 root error(s) found.
  (0) NOT_FOUND:  No registered '_FusedConv2D' OpKernel for 'GPU' devices compatible with node {{node model/conv2d/Relu}}
         (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_HALF, _XlaHasReferenceVars=false, data_format="NCHW", dilations=[1, 1, 1, 1], epsilon=0, explicit_paddings=[], fused_ops=["BiasAdd", "Relu"], leakyrelu_alpha=0.2, num_args=1, padding="SAME", strides=[1, 1, 1, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"
        .  Registered:  device='GPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_BFLOAT16]

         [[StatefulPartitionedCall/StatefulPartitionedCall_1/model/conv2d/Relu]]

This looks like fused 2D convolutions have float32, float64, and bfloat16 support, but not float16. Unfortunately, using bfloat16 is not an option in my case because the computers where inference is going to run will have GPUs that support float16 but not yet bfloat16.

Would it be possible to add support for float16 too?

I'm using tensorflow-directml-plugin version 0.3.0.dev221212, and tensorflow-cpu version 2.10.0 on a Windows 10 machine, no WSL.

leandro-gracia-gil commented 1 year ago

Actually, looking in more detail it seems like only float32 is supported for GPU, while float32, float64 and bfloat16 are supported for CPU.

In any case the original request is the same. Would it be possible to add GPU support for float16 to this operation?

maggie1059 commented 1 year ago

Hi @leandro-gracia-gil, we just published a new version of the plugin here with float16 support added for _FusedConv2D: https://pypi.org/project/tensorflow-directml-plugin/

Would you mind trying it out and letting us know if this fixes the error?

leandro-gracia-gil commented 1 year ago

Hi @maggie1059, thanks for the heads up.

It does look like this particular error is indeed fixed. Thanks!

Now I'm getting a warning tens of times, and it seems my model gets stuck or something because I don't get any results no matter how long I wait. That said, this could very well be a different unrelated problem I'm hitting afterwards. I'm pasting the warning here just in case it rings a bell.

2023-02-03 13:08:30.098584: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-03 13:08:30.098904: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 82902 MB memory) -> physical PluggableDevice (device: 0, name: DML, pci bus id: <undefined>)

The amount of device memory in the warning is also wrong. It should only be 24 GB.

In any case, if this looks unrelated I think we can probably close this issue. Thanks again.

lllck commented 2 months ago

Hello @leandro-gracia-gil , have you resolved this problem? I am also facing the same problem now.

leandro-gracia-gil commented 2 months ago

Hello @leandro-gracia-gil , have you resolved this problem? I am also facing the same problem now.

I'm afraid I didn't. I eventually stopped trying to use the DirectML plugin.

microsoft / tensorflow-directml-plugin

No float16 support for _FusedConv2D #347