Open Lucashien opened 3 months ago
Hi I can run this code on Intel Ultra 7 155U I've already update OS to 24H2 Dev channel, and install Windows 11 SDK(10.0.26100.0) in visual studio.
void InitializeDirectML(ID3D12Device1** d3dDeviceOut, ID3D12CommandQueue** commandQueueOut, IDMLDevice** dmlDeviceOut) {
// Whether to skip adapters which support Graphics in order to target NPU for testing
//bool forceComputeOnlyDevice = true;
ComPtr<IDXCoreAdapterFactory> factory;
HMODULE dxCoreModule = LoadLibraryW(L"DXCore.dll");
if (dxCoreModule)
{
auto dxcoreCreateAdapterFactory = reinterpret_cast<HRESULT(WINAPI*)(REFIID, void**)>(
GetProcAddress(dxCoreModule, "DXCoreCreateAdapterFactory")
);
if (dxcoreCreateAdapterFactory)
{
dxcoreCreateAdapterFactory(IID_PPV_ARGS(&factory));
}
}
// Create the DXCore Adapter
ComPtr<IDXCoreAdapter> adapter;
if (factory)
{
const GUID dxGUIDs[] = { DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML };
ComPtr<IDXCoreAdapterList> adapterList;
THROW_IF_FAILED(factory->CreateAdapterList(ARRAYSIZE(dxGUIDs), dxGUIDs, IID_PPV_ARGS(&adapterList)));
for (uint32_t i = 0, adapterCount = adapterList->GetAdapterCount(); i < adapterCount; i++)
{
ComPtr<IDXCoreAdapter> nextGpuAdapter;
THROW_IF_FAILED(adapterList->GetAdapter(static_cast<uint32_t>(i), IID_PPV_ARGS(&nextGpuAdapter)));
if (nextGpuAdapter->IsAttributeSupported(DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU))
{
adapter = std::move(nextGpuAdapter);
break;
}
}
}
// Create the D3D12 Device
ComPtr<ID3D12Device1> d3dDevice;
if (adapter)
{
HMODULE d3d12Module = LoadLibraryW(L"d3d12.dll");
if (d3d12Module)
{
auto d3d12CreateDevice = reinterpret_cast<HRESULT(WINAPI*)(IUnknown*, D3D_FEATURE_LEVEL, REFIID, void*)>(
GetProcAddress(d3d12Module, "D3D12CreateDevice")
);
if (d3d12CreateDevice)
{
THROW_IF_FAILED(d3d12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_1_0_GENERIC, IID_PPV_ARGS(&d3dDevice)));
}
}
}
// Create the DML Device and D3D12 Command Queue
ComPtr<IDMLDevice> dmlDevice;
ComPtr<ID3D12CommandQueue> commandQueue;
if (d3dDevice)
{
D3D12_COMMAND_QUEUE_DESC queueDesc = {};
queueDesc.Type = D3D12_COMMAND_LIST_TYPE_COMPUTE;
THROW_IF_FAILED(d3dDevice->CreateCommandQueue(
&queueDesc,
IID_PPV_ARGS(commandQueue.ReleaseAndGetAddressOf())));
HMODULE dmlModule = LoadLibraryW(L"DirectML.dll");
if (dmlModule)
{
auto dmlCreateDevice = reinterpret_cast<HRESULT(WINAPI*)(ID3D12Device*, DML_CREATE_DEVICE_FLAGS, DML_FEATURE_LEVEL, REFIID, void*)>(
GetProcAddress(dmlModule, "DMLCreateDevice1")
);
if (dmlCreateDevice)
{
THROW_IF_FAILED(dmlCreateDevice(d3dDevice.Get(), DML_CREATE_DEVICE_FLAG_NONE, DML_FEATURE_LEVEL_5_0, IID_PPV_ARGS(dmlDevice.ReleaseAndGetAddressOf())));
}
}
}
d3dDevice.CopyTo(d3dDeviceOut);
commandQueue.CopyTo(commandQueueOut);
dmlDevice.CopyTo(dmlDeviceOut);
}
Thanks for your experience. I will try to update my OS to Dev channel. Thank you
Update to Windows 11 SDK(10.0.26100.0) would work for DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE
not found on my side.
@Lucashien I was able to make NPU run the model with the following changes: In my case, the third adapter seems the NPU device (Intel AI Boost). Upgrading to Windows Insider was not necessary.
HW: ThinkPad X1 Carbon Gen 12, Intel(R) Core(TM) Ultra 7 155U OS: Windows 11 23H2 (Build 22631.4037)
I'm on the older Intel NPU that is present in the Surface Laptop Studio 2. I believe it's a Movidius 3700VC. (Its PCI hardware id is ven_8086&dev_6240
.)
Although I was able to force this example to use that device simply by adjusting the for
loop so it starts at a higher offset, thus skipping past the various other devices the example would otherwise choose, I get a problem when I reach this line:
THROW_IF_FAILED(d3d12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_1_0_CORE, IID_PPV_ARGS(&d3dDevice)));
I've added code to enable the D3D debug layer, and with that in place, I see this:
Exception thrown at 0x00007FF91BE76D9A in DirectMLNpuInference.exe: Microsoft C++ exception: _com_error at memory location 0x000000824F0FC310.
Exception thrown at 0x00007FF91BE76D9A in DirectMLNpuInference.exe: Microsoft C++ exception: SHASTA::Exception<D3D12::KMB::AdapterTraits,long> at memory location 0x000000824F0FC470.
D3D12: Removing Device.
D3D12 WARNING: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR: There is strong evidence that the driver has performed an undefined operation; but it may be because the application performed an illegal or undefined operation to begin with.). [ EXECUTION WARNING #233: DEVICE_REMOVAL_PROCESS_POSSIBLY_AT_FAULT]
Initially I was on v31.0.100.2016 of the NPU driver, which is what Windows Update installs. I found that the Intel NPU driver page lists newer versions, but the latest (32.0.100.2820) doesn't actually support this device. But 32.0.100.2408 does support the device, and I've been able to install that. (And apparently there is a package on Windows Update that includes this version but I couldn't work out how to get Windows to offer me that.)
But I still get the same error.
So I think there are two issues here:
I think 1 is down to this line here:
else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE))
That won't select a compute-only device. It will select any device that offers compute. On my laptop, every device (Intel(R) Iris(R) Xe Graphics, NVIDIA GeForce RTX 4060 Laptop GPU, Intel(R) NPU, and even the Microsoft Basic Render Driver software device).
I think that should probably be this:
else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE)
&& !currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GRAPHICS))
So this will match only if the device supports compute and it does not support graphics. That's what I'd expect "compute only device" to mean, and this does indeed reject all devices except for the Intel NPU.
But having fixed that, the code just doesn't seem to work. I know the Intel driver still reports DirectML support as "preview". Are there any examples anywhere that show successful DirectML use on the Intel NPU that's in the Surface Laptop Studio 2?
I’m encountering issues when attempting to run DirectML inference on an Intel NPU. Specifically, the sample code will use my GPU instead of targeting the NPU. Here’s the relevant code as below. When I set the GUID to DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU, the application fails to find the NPU device, printing "No NPU device found."
Here are the specifics of my hardware and software setup:
CPU: Intel(R) Core(TM) Ultra 9 185H GPU: RTX 4060 Laptop NPU: Intel(R) AI Boost Driver Version: 32.0.100.2688 DirectX Version: 12
Nuget information:
Has anyone successfully run DirectML inference on an Intel NPU? If so, what steps were taken to properly configure the adapter and ensure the NPU was used?
Thank you for your assistance!