microsoft / DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
MIT License
2.24k stars 299 forks source link

DirectMLNpuInference fails to run on the intel NPU #625

Open Lucashien opened 3 months ago

Lucashien commented 3 months ago

I’m encountering issues when attempting to run DirectML inference on an Intel NPU. Specifically, the sample code will use my GPU instead of targeting the NPU. Here’s the relevant code as below. When I set the GUID to DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU, the application fails to find the NPU device, printing "No NPU device found."

ComPtr<IDXCoreAdapter> adapter;
if (factory)
{
    const GUID dxGUIDs[] = { DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE };
    ComPtr<IDXCoreAdapterList> adapterList;
    THROW_IF_FAILED(factory->CreateAdapterList(ARRAYSIZE(dxGUIDs), dxGUIDs, IID_PPV_ARGS(&adapterList)));
    for (uint32_t i = 0, adapterCount = adapterList->GetAdapterCount(); i < adapterCount; i++)
    {
        ComPtr<IDXCoreAdapter> currentGpuAdapter;
        THROW_IF_FAILED(adapterList->GetAdapter(static_cast<uint32_t>(i), IID_PPV_ARGS(&currentGpuAdapter)));

        if (!forceComputeOnlyDevice && !forceGenericMLDevice)
        {
            // No device restrictions
            adapter = std::move(currentGpuAdapter);
            break;
        }
        else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE))
        {
            adapter = std::move(currentGpuAdapter);
            break;
        }
        else if (forceGenericMLDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML))
        {
            adapter = std::move(currentGpuAdapter);
            break;
        }
    }
}

Here are the specifics of my hardware and software setup:

CPU: Intel(R) Core(TM) Ultra 9 185H GPU: RTX 4060 Laptop NPU: Intel(R) AI Boost Driver Version: 32.0.100.2688 DirectX Version: 12

Nuget information: image

Has anyone successfully run DirectML inference on an Intel NPU? If so, what steps were taken to properly configure the adapter and ensure the NPU was used?

Thank you for your assistance!

WTian-Yu commented 3 months ago

Hi I can run this code on Intel Ultra 7 155U I've already update OS to 24H2 Dev channel, and install Windows 11 SDK(10.0.26100.0) in visual studio.

void InitializeDirectML(ID3D12Device1** d3dDeviceOut, ID3D12CommandQueue** commandQueueOut, IDMLDevice** dmlDeviceOut) {
    // Whether to skip adapters which support Graphics in order to target NPU for testing
    //bool forceComputeOnlyDevice = true;
    ComPtr<IDXCoreAdapterFactory> factory;
    HMODULE dxCoreModule = LoadLibraryW(L"DXCore.dll");
    if (dxCoreModule)
    {
        auto dxcoreCreateAdapterFactory = reinterpret_cast<HRESULT(WINAPI*)(REFIID, void**)>(
            GetProcAddress(dxCoreModule, "DXCoreCreateAdapterFactory")
            );
        if (dxcoreCreateAdapterFactory)
        {
            dxcoreCreateAdapterFactory(IID_PPV_ARGS(&factory));
        }
    }
    // Create the DXCore Adapter
    ComPtr<IDXCoreAdapter> adapter;
    if (factory)
    {
        const GUID dxGUIDs[] = { DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML };
        ComPtr<IDXCoreAdapterList> adapterList;
        THROW_IF_FAILED(factory->CreateAdapterList(ARRAYSIZE(dxGUIDs), dxGUIDs, IID_PPV_ARGS(&adapterList)));
        for (uint32_t i = 0, adapterCount = adapterList->GetAdapterCount(); i < adapterCount; i++)
        {
            ComPtr<IDXCoreAdapter> nextGpuAdapter;
            THROW_IF_FAILED(adapterList->GetAdapter(static_cast<uint32_t>(i), IID_PPV_ARGS(&nextGpuAdapter)));
            if (nextGpuAdapter->IsAttributeSupported(DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU))
            {
                adapter = std::move(nextGpuAdapter);
                break;
            }
        }
    }
    // Create the D3D12 Device
    ComPtr<ID3D12Device1> d3dDevice;
    if (adapter)
    {
        HMODULE d3d12Module = LoadLibraryW(L"d3d12.dll");
        if (d3d12Module)
        {
            auto d3d12CreateDevice = reinterpret_cast<HRESULT(WINAPI*)(IUnknown*, D3D_FEATURE_LEVEL, REFIID, void*)>(
                GetProcAddress(d3d12Module, "D3D12CreateDevice")
                );
            if (d3d12CreateDevice)
            {
                THROW_IF_FAILED(d3d12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_1_0_GENERIC, IID_PPV_ARGS(&d3dDevice)));
            }
        }
    }
    // Create the DML Device and D3D12 Command Queue
    ComPtr<IDMLDevice> dmlDevice;
    ComPtr<ID3D12CommandQueue> commandQueue;
    if (d3dDevice)
    {
        D3D12_COMMAND_QUEUE_DESC queueDesc = {};
        queueDesc.Type = D3D12_COMMAND_LIST_TYPE_COMPUTE;
        THROW_IF_FAILED(d3dDevice->CreateCommandQueue(
            &queueDesc,
            IID_PPV_ARGS(commandQueue.ReleaseAndGetAddressOf())));
        HMODULE dmlModule = LoadLibraryW(L"DirectML.dll");
        if (dmlModule)
        {
            auto dmlCreateDevice = reinterpret_cast<HRESULT(WINAPI*)(ID3D12Device*, DML_CREATE_DEVICE_FLAGS, DML_FEATURE_LEVEL, REFIID, void*)>(
                GetProcAddress(dmlModule, "DMLCreateDevice1")
                );
            if (dmlCreateDevice)
            {
                THROW_IF_FAILED(dmlCreateDevice(d3dDevice.Get(), DML_CREATE_DEVICE_FLAG_NONE, DML_FEATURE_LEVEL_5_0, IID_PPV_ARGS(dmlDevice.ReleaseAndGetAddressOf())));
            }
        }
    }

    d3dDevice.CopyTo(d3dDeviceOut);
    commandQueue.CopyTo(commandQueueOut);
    dmlDevice.CopyTo(dmlDeviceOut);
}
Lucashien commented 3 months ago

Thanks for your experience. I will try to update my OS to Dev channel. Thank you

xiaoweiChen commented 2 months ago

Update to Windows 11 SDK(10.0.26100.0) would work for DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE not found on my side.

kmaki565 commented 2 months ago

@Lucashien I was able to make NPU run the model with the following changes: image In my case, the third adapter seems the NPU device (Intel AI Boost). Upgrading to Windows Insider was not necessary.

HW: ThinkPad X1 Carbon Gen 12, Intel(R) Core(TM) Ultra 7 155U OS: Windows 11 23H2 (Build 22631.4037)

idg10 commented 1 month ago

I'm on the older Intel NPU that is present in the Surface Laptop Studio 2. I believe it's a Movidius 3700VC. (Its PCI hardware id is ven_8086&dev_6240.)

Although I was able to force this example to use that device simply by adjusting the for loop so it starts at a higher offset, thus skipping past the various other devices the example would otherwise choose, I get a problem when I reach this line:

THROW_IF_FAILED(d3d12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_1_0_CORE, IID_PPV_ARGS(&d3dDevice)));

I've added code to enable the D3D debug layer, and with that in place, I see this:

Exception thrown at 0x00007FF91BE76D9A in DirectMLNpuInference.exe: Microsoft C++ exception: _com_error at memory location 0x000000824F0FC310.
Exception thrown at 0x00007FF91BE76D9A in DirectMLNpuInference.exe: Microsoft C++ exception: SHASTA::Exception<D3D12::KMB::AdapterTraits,long> at memory location 0x000000824F0FC470.
D3D12: Removing Device.
D3D12 WARNING: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR: There is strong evidence that the driver has performed an undefined operation; but it may be because the application performed an illegal or undefined operation to begin with.). [ EXECUTION WARNING #233: DEVICE_REMOVAL_PROCESS_POSSIBLY_AT_FAULT]

Initially I was on v31.0.100.2016 of the NPU driver, which is what Windows Update installs. I found that the Intel NPU driver page lists newer versions, but the latest (32.0.100.2820) doesn't actually support this device. But 32.0.100.2408 does support the device, and I've been able to install that. (And apparently there is a package on Windows Update that includes this version but I couldn't work out how to get Windows to offer me that.)

But I still get the same error.

So I think there are two issues here:

  1. the logic in the device selection loop isn't quite right
  2. this example just doesn't work for the Intel NPU that's in a Surface Laptop Studio 2

I think 1 is down to this line here:

else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE))

That won't select a compute-only device. It will select any device that offers compute. On my laptop, every device (Intel(R) Iris(R) Xe Graphics, NVIDIA GeForce RTX 4060 Laptop GPU, Intel(R) NPU, and even the Microsoft Basic Render Driver software device).

I think that should probably be this:

else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE)
    && !currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GRAPHICS))

So this will match only if the device supports compute and it does not support graphics. That's what I'd expect "compute only device" to mean, and this does indeed reject all devices except for the Intel NPU.

But having fixed that, the code just doesn't seem to work. I know the Intel driver still reports DirectML support as "preview". Are there any examples anywhere that show successful DirectML use on the Intel NPU that's in the Surface Laptop Studio 2?