microsoft / tensorflow-directml-plugin

DirectML PluggableDevice plugin for TensorFlow 2
Apache License 2.0
185 stars 25 forks source link

Unable to use directml on NVIDIA GPU. `UnimplementedError: Graph execution error.` #362

Open github-user-en opened 1 year ago

github-user-en commented 1 year ago

Hello,

I'm currently trying to run TensorFlow on a Windows computer using the tensorflow-directml-plugin as discussed in this guide.

My computer is equipped with an NVIDIA Quadro K1200 GPU, which supports DirectX12. You can check its capabilities here.

I'm using the NVIDIA Graphics Driver version 528.89.

The code I'm working with is located in this notebook. When I run the fit() method, I encounter an error message: UnimplementedError: Graph execution error. This issue is visible in the output of cell 20 in the notebook.

I am accessing this computer via Remote Desktop Protocol (RDP), and it seems that DxDIAG doesn't recognize my NVIDIA GPU in its list of Display Adapters. Instead, it displays "Microsoft Remote Display Adapter." However, the Device Manager correctly lists the NVIDIA GPU as an active device.

Here are my questions:

  1. Could the remote connection be causing this error?
  2. The machine also has an Intel i7-6700 CPU with built-in graphics. In this context, when tf.config.list_physical_devices('GPU') outputs GPU:0, is it referring to the built-in Intel graphics or the NVIDIA GPU? If it's the built-in Intel graphics, could this be the cause of the error?
  3. If none of the above factors are causing the error, do you have any ideas about what might be causing it?

I appreciate your assistance and guidance in resolving this issue.

Thank you.

PatriceVignola commented 11 months ago

The plugin should be able to ignore the Remote Desktop Adapter, so it's unexpected that it is still being enumerated in your case. One thing you can do is set the DML_VISIBLE_DEVICES environment variable to only list the device that you're interested in (e.g. DML_VISIBLE_DEVICES="1").

Unfortunately, we had to pause the development of this plugin until further notice so it's not something that we can fix at the moment. For the time being, all latest DirectML features and performance improvements are going into onnxruntime for inference scenarios. We'll update this issue if/when things change.