microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.25k stars 2.87k forks source link

[Feature Request] DirectML expose enumeration API #22108

Open jacob-vincent-mink opened 2 weeks ago

jacob-vincent-mink commented 2 weeks ago

Describe the feature request

When using DirectML, we have access to any DirectX 12 device and can run a model on that device provided it supports the required operators. However, there is no documentation for a way we can identify the available devices - namely, we can tell the DirectML EP to run on device ID 0, 1, ... but unless there's some mapping, how do we tell a priori which device we will use?

Describe scenario use case

This becomes especially important when I want to use DirectML for both GPU devices and NPU devices (when supported). Suppose I am on a system with two GPUs (let's say Intel primary and NVIDIA gaming) and an NPU that is supported by DirectML. Technically, all three of these devices should be visible to DirectML. Suppose I'm writing an application that can hot-switch the model to another device based on the user choice (i.e. I should list all the directX devices) - I cannot do that right now and even if I took the time to write the C# wrapper for DXGI (or find one that is properly maintained as of 2024), I would still then need to connect those to the IDs in DirectML's options input.

fdwr commented 2 weeks ago

If you want to use a specific IDMLDevice created from a specific D3D device for that DXCore/DXGI adapter, how about using the other overload?

include\dml_provider_factory.h

struct OrtDmlApi {
  ...
  /**
   * Creates a DirectML Execution Provider using the given DirectML device, and which executes work on the supplied D3D12
   * command queue. The DirectML device and D3D12 command queue must have the same parent ID3D12Device, or an error will
   * be returned. The D3D12 command queue must be of type DIRECT or COMPUTE (see D3D12_COMMAND_LIST_TYPE). If this 
   * function succeeds, the inference session maintains a strong reference on both the dml_device and the command_queue 
   * objects.
   * See also: DMLCreateDevice
   * See also: ID3D12Device::CreateCommandQueue
   */
  ORT_API2_STATUS(SessionOptionsAppendExecutionProvider_DML1, _In_ OrtSessionOptions* options,
                _In_ IDMLDevice* dml_device, _In_ ID3D12CommandQueue* cmd_queue);

Usage:

    // Already created DML device and command queue above.

    Ort::SessionOptions sessionOptions;
    sessionOptions.SetExecutionMode(ExecutionMode::ORT_SEQUENTIAL); // For DML EP
    sessionOptions.DisableMemPattern(); // For DML EP
    ortApi.AddFreeDimensionOverrideByName(sessionOptions, "batch_size", batchSize);
    ortDmlApi->SessionOptionsAppendExecutionProvider_DML1(sessionOptions, dmlDevice, commandQueue);

Note the DML EP code just does the following, turning the id into a DXGI adapter directly:

onnxruntime\core\providers\dml\dml_provider_factory.cc

    ComPtr<IDXGIFactory4> dxgi_factory;
    ORT_THROW_IF_FAILED(CreateDXGIFactory2(0, IID_GRAPHICS_PPV_ARGS(dxgi_factory.ReleaseAndGetAddressOf())));

    ComPtr<IDXGIAdapter1> adapter;
    ORT_THROW_IF_FAILED(dxgi_factory->EnumAdapters1(device_id, &adapter));

Alas, DXGI did not enumerate NPU's, meaning the device id approach only works for GPU's, and so DXCore is the only route for NPU's. Though, @smk2007 may know another way to access the NPU.

jacob-vincent-mink commented 2 weeks ago

That makes sense for C++ and I didn't know that DXGI won't enumerate the NPUs - but I'm specifically looking for a way to do this in C# or at least confirmation that the only way is by going down to wrapping up the C++ APIs myself.