jp7677 / dxvk-nvapi

Alternative NVAPI implementation on top of DXVK.
MIT License
353 stars 32 forks source link

d3d12: Improve NvAPI_D3D12_EnumerateMetaCommands #183

Closed jp7677 closed 3 weeks ago

jp7677 commented 3 weeks ago

Return politely that we have no meta commands. Need to find a few titles though to verify that no one calls create-metacommand after this.

cc: @SveSop

jp7677 commented 3 weeks ago

Mmh, the msvc segfault looks scary and might hint at something else that is wrong.

SveSop commented 3 weeks ago

This is not supposed to return NVAPI_END_ENUMERATION me thinks.. Testing this on windows it returns NVAPI_OK when called with *pNumMetaCommands = 0 && *pDescs = nullptr, it crashes if *pNumMetaCommands >= 1 && pDescs = nullptr, and return the below if *pNumMetaCommands >= 3 && (pDescs) (no matter if i only allocate pDescs = new NVAPI_META_COMMAND_DESC[1];, but that may just be lucky in case no memory is overwritten i suppose).

        switch (*pNumMetaCommands) {
            case 3:
                pDescs[2].Id = { 0x8F9FF059, 0xFE72, 0x488E, {0xA0, 0x66, 0xB1, 0x4E, 0x79, 0x48, 0xEC, 0x08 }};
                pDescs[2].Name = L"GEMM (General matrix multiply)";
                pDescs[2].InitializationDirtyState = NV_D3D_GRAPHICS_STATE_NONE;
                pDescs[2].ExecutionDirtyState = NV_D3D_GRAPHICS_STATE_NONE;
                [[fallthrough]];
            case 2:
                pDescs[1].Id = { 0xE1B112EB, 0xDECD, 0x4FF6, {0x85, 0xBB, 0x1F, 0x0E, 0x3A, 0xB0, 0x04, 0x14 }};
                pDescs[1].Name = L"Convolution Fused with pooling/upsampling";
                pDescs[1].InitializationDirtyState = NV_D3D_GRAPHICS_STATE_NONE;
                pDescs[1].ExecutionDirtyState = NV_D3D_GRAPHICS_STATE_NONE;
                [[fallthrough]];
            case 1:
                pDescs[0].Id = { 0xA7666F1E, 0x9C55, 0x47EE, {0x9E, 0xB3, 0xE1, 0x62, 0x00, 0x92, 0xD1, 0xE9 }};
                pDescs[0].Name = L"Convolution Extended";
                pDescs[0].InitializationDirtyState = NV_D3D_GRAPHICS_STATE_NONE;
                pDescs[0].ExecutionDirtyState = NV_D3D_GRAPHICS_STATE_NONE;
                break;
            default:
                break;
        }

I think maybe the "right" think would possibly be:

if (pDevice == nullptr || pNumMetaCommands == nullptr)
            return InvalidArgument(n);
*pNumMetaCommands = 0;
return Ok(n);

Since pDescs

Otherwise should have enough space to hold *pNumMetaCommands descriptors

It should be allocated by the caller, and all fields would be blank no matter how much is allocated, unless the driver (we) fills it in, so if some game is dead set on using something from here without bothering with checking how many *pNumMetaCommands is supported, i am not sure how to fix that. The demo i mentioned earlier was pre-alpha, so it was probably not done right there, but i will look out for some other to test with.

If i ran the call with *pNumMetaCommands = 0 and pDescs = new NVAPI_META_COMMAND_DESC[3]; The driver returned nothing readable in the NVAPI_META_COMMAND_DESC fields, but if i ran it with:

*pNumMetaCommands = 4;
pDescs = new NVAPI_META_COMMAND_DESC[4];

It returned NVAPI_OK and set *pNumMetaCommands = 3, and i crashed if trying to read from pDescs[4], so i dunno if the driver re-allocates memory or something like that?

jp7677 commented 3 weeks ago

Mmh, the msvc segfault looks scary and might hint at something else that is wrong.

Looks like a fluke with the runners, https://github.com/actions/runner-images/issues mentions some msvc related issues with the latest runner version (20240603.1.0). Our last successful run was on version 20240514.3.0.

SveSop commented 3 weeks ago

Yeah, this looks more like the implementation I'd expect, the interface seems similar to vkEnumeratePhysicalDevices or ID3D12Device5::EnumerateMetaCommands.

I wonder if we shouldn't return here stuff like DirectStorage or DirectSR. But if native Nvidia Windows drivers don't do that then maybe we don't have to bother either.

Funny.. but when running this call with 32-bit, it actually returns this:

D3D12 Tests:
Create DXGI Factory interface success!
Created D3D12 Device (02B80350)
------------------------------------------------------
Number of supported MetaCommands: 1
Getting 1 MetaCommand(s)
MetaCommand Info: 0
   GUID                                 : {1BDDD090-C47E-459C-8F81-42C9F97A5308}
   MetaCommand name                     : Direct Storage
   InitializationDirtyState settings    : 4128
   ExecutionDirtyState settings         : 4128

------------------------------------------------------

Not entirely sure why it would not be listed in 64-bit..

Just for reference, this is the output in 64-bit on my test.

D3D12 Tests:
Create DXGI Factory interface success!
Created D3D12 Device (000002858CAD7A70)
------------------------------------------------------
Number of supported MetaCommands: 3
Getting 3 MetaCommand(s)
MetaCommand Info: 0
   GUID                                 : {A7666F1E-9C55-47EE-9EB3-E1620092D1E9}
   MetaCommand name                     : Convolution Extended
   InitializationDirtyState settings    : 0
   ExecutionDirtyState settings         : 0
MetaCommand Info: 1
   GUID                                 : {E1B112EB-DECD-4FF6-85BB-1F0E3AB00414}
   MetaCommand name                     : Convolution Fused with pooling/upsampling
   InitializationDirtyState settings    : 0
   ExecutionDirtyState settings         : 0
MetaCommand Info: 2
   GUID                                 : {8F9FF059-FE72-488E-A066-B14E7948EC08}
   MetaCommand name                     : GEMM (General matrix multiply)
   InitializationDirtyState settings    : 0
   ExecutionDirtyState settings         : 0

------------------------------------------------------

Not sure if any driver settings or similar could have impact on this?

jp7677 commented 3 weeks ago

Mmh, the msvc segfault looks scary and might hint at something else that is wrong.

Looks like a fluke with the runners, https://github.com/actions/runner-images/issues mentions some msvc related issues with the latest runner version (20240603.1.0). Our last successful run was on version 20240514.3.0.

We got a new runner version (20240610.1.0) and all is good again.