microsoft / DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
MIT License
2.21k stars 294 forks source link

NPU dmlDevice not loading #589

Open alex2060 opened 4 months ago

alex2060 commented 4 months ago

dmlDevice Is not referenced after creation.

From https://github.com/microsoft/DirectML/blob/master/Samples/DirectMLNpuInference/main.cpp

// Create the DML Device and D3D12 Command Queue ComPtr dmlDevice; ComPtr commandQueue; if (d3dDevice) { D3D12_COMMAND_QUEUE_DESC queueDesc = {}; queueDesc.Type = D3D12_COMMAND_LIST_TYPE_COMPUTE; THROW_IF_FAILED(d3dDevice->CreateCommandQueue( &queueDesc, IID_PPV_ARGS(commandQueue.ReleaseAndGetAddressOf()))); HMODULE dmlModule = LoadLibraryW(L"DirectML.dll"); if (dmlModule) { auto dmlCreateDevice = reinterpret_cast<HRESULT(WINAPI)(ID3D12Device, DML_CREATE_DEVICE_FLAGS, DML_FEATURE_LEVEL, REFIID, void*)>( GetProcAddress(dmlModule, "DMLCreateDevice1") ); if (dmlCreateDevice) { THROW_IF_FAILED(dmlCreateDevice(d3dDevice.Get(), DML_CREATE_DEVICE_FLAG_NONE, DML_FEATURE_LEVEL_5_0, IID_PPV_ARGS(dmlDevice.ReleaseAndGetAddressOf()))); } } }

d3dDevice.CopyTo(d3dDeviceOut);
commandQueue.CopyTo(commandQueueOut);
dmlDevice.CopyTo(dmlDeviceOut);
Lucashien commented 2 months ago

I’m encountering issues when attempting to run DirectML inference on an Intel NPU. Specifically, the sample code I’m using defaults to my GPU instead of targeting the NPU. Here’s the relevant code snippet where I attempt to select the appropriate adapter:

ComPtr<IDXCoreAdapter> adapter;
if (factory)
{
    const GUID dxGUIDs[] = { DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE };
    ComPtr<IDXCoreAdapterList> adapterList;
    THROW_IF_FAILED(factory->CreateAdapterList(ARRAYSIZE(dxGUIDs), dxGUIDs, IID_PPV_ARGS(&adapterList)));
    for (uint32_t i = 0, adapterCount = adapterList->GetAdapterCount(); i < adapterCount; i++)
    {
        ComPtr<IDXCoreAdapter> currentGpuAdapter;
        THROW_IF_FAILED(adapterList->GetAdapter(static_cast<uint32_t>(i), IID_PPV_ARGS(&currentGpuAdapter)));

        if (!forceComputeOnlyDevice && !forceGenericMLDevice)
        {
            // No device restrictions
            adapter = std::move(currentGpuAdapter);
            break;
        }
        else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE))
        {
            adapter = std::move(currentGpuAdapter);
            break;
        }
        else if (forceGenericMLDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML))
        {
            adapter = std::move(currentGpuAdapter);
            break;
        }
    }
}

When I set the GUID to DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU, the application fails to find and list the NPU device, printing "No NPU device found."

Here are the specifics of my hardware and software setup:

CPU: Intel(R) Core(TM) Ultra 9 185H GPU: RTX 4060 Laptop NPU: Intel(R) AI Boost Driver Version: 32.0.100.2688 DirectX Version: 12

Nuget information: image

Has anyone successfully run DirectML inference on an Intel NPU? If so, what steps were taken to properly configure the adapter and ensure the NPU was used?

Thank you for your assistance!