Daz Studio lacks some NvAPI methods for iRay rendering

PetitMote commented 2 years ago

Hello there,

It’s been a while, but some update (from Daz Studio) broke iRay rendering on nvidia GPU through Wine. I’m only a recent user, so I don’t know how it used to work or if dxvk_nvapi was needed, but I think it used to work before the first releases of dxvk, when wine-staging was needed. Rendering works fine on CPU, but it’s much slower.

I’ve checked the logs and tried with and without dxvk_nvapi. Here are the logs from dxvk_nvapi:

---------- 2022-01-04 18:46:40 ----------
NvAPI_QueryInterface 0xad298d3f: Unknown function ID
DXVK-NVAPI v0.5-20-ge23d450 (DAZStudio.exe)
NVML loaded and initialized successfully
NvAPI Device: NVIDIA GeForce RTX 3060 (495.46.0)
NvAPI Output: \\.\DISPLAY1
NvAPI_Initialize: OK
NvAPI_QueryInterface 0x33c7358c: Unknown function ID
NvAPI_QueryInterface 0x593e8644: Unknown function ID
NvAPI_GetInterfaceVersionString: OK
NvAPI_EnumLogicalGPUs: OK
NvAPI_EnumPhysicalGPUs: OK
NvAPI_QueryInterface 0x1efc3957: Unknown function ID
NvAPI_EnumNvidiaDisplayHandle 0: OK
NvAPI_GetPhysicalGPUsFromDisplay: OK
NvAPI_QueryInterface NvAPI_GetAssociatedNvidiaDisplayName: Not implemented method
NvAPI_GetErrorMessage -3 (NVAPI_NO_IMPLEMENTATION): OK
NvAPI_EnumNvidiaDisplayHandle 1: End enumeration
NvAPI_EnumNvidiaUnAttachedDisplayHandle 0: End enumeration
NvAPI_QueryInterface NvAPI_GPU_GetBusType: Not implemented method
NvAPI_GetErrorMessage -3 (NVAPI_NO_IMPLEMENTATION): OK
NvAPI_GPU_GetFullName: OK
NvAPI_GPU_GetVbiosVersionString: OK
NvAPI_Initialize: OK
NvAPI_SYS_GetDriverAndBranchVersion: OK
NvAPI_EnumPhysicalGPUs: OK
NvAPI_EnumLogicalGPUs: OK
NvAPI_QueryInterface NvAPI_EnumTCCPhysicalGPUs: Not implemented method
NvAPI_GetErrorMessage -3 (NVAPI_NO_IMPLEMENTATION): OK
NvAPI_GetPhysicalGPUsFromLogicalGPU: OK

I’ve started by reading the logs from Daz Studio, and there isn’t more, but I can still provide the log file if needed. Also, when dxvk_nvapi is disabled, I get the error on the NvAPI_Initialize method (3 times), and not on these 3 methods.

I believe the issue is due to these methods not being implemented on dxvk_nvapi. I would sincerely love to help and try to add them, but I’m merely a python beginner and c++ looks like sorcery to me (and, well, we’re talking about some low level coding). But I’d be happy to help or try if someone can point me in the right direction.

Also, DazStudio physic simulation, dforce, used to work, but now it only says "no OpenCL 1.2 compatible device found". I have absolutely no idea if it’s related, there is no information in the logs about it, but well, maybe if we fix the NvAPI issue we’ll get something.

Thank you guys for your work, it’s still pretty amazing.

jp7677 commented 2 years ago

Ok, sry, I really shouldn’t comment on c code :). I think you did it correct the first time and I’ve convinced you now to print the pointer address.

So either the luids are different or it still needs to be printed differently. What luid do you see in vulkaninfo?

PetitMote commented 2 years ago

I don’t think I have vulkaninfo, and I mostly find how to uninstall it. Should I get somewhere?

SveSop commented 2 years ago

After doing pretty much what @PetitMote did - i grabbed all the exports from windows 497.29 nvcuda.dll, and ended up with 120 something more stubs (most of what you posted before + a few), it did not use any of them.. That leaves the most glaring possibility like you are talking about now: Something going wrong when picking up the LUID/UUID/Dev-id or whatever and making it try to initialize a non-existing gpu.

Now, for all we know, actually hitting the RIGHT LUID/whatever, may end up triggering even more cuda stubs, so i guess ill just keep my list for now in case we get further :smile:

Just for info tho:

NvAPI_QueryInterface 0x33c7358c: Unknown function ID
NvAPI_QueryInterface 0x593e8644: Unknown function ID

Those two functions gets called almost by all known uses of nvapi and is some debug/control functions NvAPI_Diag_ReportCallStart and NvAPI_Diag_ReportCallReturn respectively. Have no idea how to use, but it is not part of the public API, so it can be some sort of internal NDA debug function perhaps. Not what breaks this.

This however: NvAPI_QueryInterface 0x1efc3957: Unknown function ID NvAPI_Coproc_GetCoprocStatus

Not really sure, but could be something? It seems to be some sort of opencl/cuda "co-op gpu processor" usage thing? Not sure why it would "fail" DAZ, but i can see it maybe having use for multi-gpu rendering or something like that?

SveSop commented 2 years ago

I don’t think I have vulkaninfo, and I mostly find how to uninstall it. Should I get somewhere?

Afaik its part of vulkan-tools from vulkan-sdk package.

jp7677 commented 2 years ago

I didn’t want to find out which was the right one smile

04c0:trace:nvcuda:wine_cuDeviceGetLuid LUID* s ((null))
04c0:trace:nvcuda:wine_cuDeviceGetLuid LUID s ()
04c0:trace:nvcuda:wine_cuDeviceGetLuid LUID* d (0)
04c0:trace:nvcuda:wine_cuDeviceGetLuid LUID d (349801784)

I justed asked on a place where people know C, it should be indeed TRACE("%s", luid) or even better TRACE("%s\n", debugstr_a(luid)). So I guess the returned luid is empty (``) which might explain the crash.

jp7677 commented 2 years ago

I don’t think I have vulkaninfo, and I mostly find how to uninstall it. Should I get somewhere?

Afaik its part of vulkan-tools from vulkan-sdk package.

On Fedora it is dnf install vulkan-tools

PetitMote commented 2 years ago

I got it, but it’s weird:

VkPhysicalDeviceIDProperties
  deviceUUID = 24f0ee93-f829-6766-b3d1-40d988bd8186
  driverUUID = de55dcf3-ac5c-522a-8393-764e32056b9f
  deviceNodeMask = 1
  deviceLUIDValid = false

jp7677 commented 2 years ago

No, not weird, I just checked my machine, native vulkaninfo shows the same here. This might also explain that the luid is empty, since the wine-cuda call is just forwarded to cuda from the linux driver. I guess wine or wine-vulkan (which is used by dxvk-nvapi) ensures that there is a valid luid.

Actually you could try to set the luid yourself in the wine-cuda code. Thought not exactly sure what it should be, may be just

// from https://github.com/jp7677/dxvk-nvapi/blob/master/tests/nvapi_sysinfo.cpp#L448
//  LUID high part:    0x00000000
//  LUID low part:    0x000003f0
luid = "f3000000";

Edit: I'm validating this, give me a minute.

PetitMote commented 2 years ago

I was thinking the same. I put this:

CUresult WINAPI wine_cuDeviceGetLuid(char *luid, unsigned int *deviceNodeMask, CUdevice dev)
{
    TRACE("(%p, %p, %d)\n", luid, deviceNodeMask, dev);
    auto error = pcuDeviceGetLuid(luid, deviceNodeMask, dev);
    TRACE("LUID* s (%s)\n", *luid);
    TRACE("LUID s (%s)\n", luid);
    TRACE("LUID* d (%d)\n", *luid);
    TRACE("LUID d (%d)\n", luid);
    luid = "f3000000";
    return CUDA_SUCCESS;
}

See you in a minute, I’ll compile and try just to see :smile:

PetitMote commented 2 years ago

Hey, I really think it worked! Well, the software crashed, but we still got further in the logs!

04b0:trace:nvcuda:wine_cuDeviceGetLuid (0x14d98d38, 0x14d98d40, 0)
04b0:trace:nvcuda:wine_cuDeviceGetLuid LUID* s ((null))
04b0:trace:nvcuda:wine_cuDeviceGetLuid LUID s ()
04b0:trace:nvcuda:wine_cuDeviceGetLuid LUID* d (0)
04b0:trace:nvcuda:wine_cuDeviceGetLuid LUID d (349801784)
# Usually where it stops #
# No mention of "incompatible GPU"! #
04b0:trace:nvcuda:Unknown2_func1_relay (0x14d903d0, 0x101eec0)
04b0:trace:nvcuda:Unknown2_func5_relay (0x14d903d8, 0x101eec8)
04b0:trace:nvcuda:wine_cuGetExportTable (0x14d903c8, 0x14d653b0)
04b0:trace:nvcuda:wine_cuGetExportTable (0x101ee98, 0x14d67640)
04b0:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98db0, 17, 0)
04b0:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98dbc, 20, 0)
04b0:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98d84, 13, 0)
04b0:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98e88, 36, 0)
04b0:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98ec4, 87, 0)
0530:trace:nvcuda:DllMain (0x7ff4bc1d0000, 2, (nil))
0530:trace:nvcuda:cuda_process_tls_callbacks (2)
# A bunch more of DllMain #
0530:trace:nvcuda:wine_cuInit (0)
0530:trace:nvcuda:wine_cuInit (0)
NvAPI_Initialize: OK
NvAPI_SYS_GetDriverAndBranchVersion: OK
0530:trace:nvcuda:wine_cuInit (0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98db0, 17, 0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98dbc, 20, 0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98d84, 13, 0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98e88, 36, 0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98ec4, 87, 0)
NvAPI_EnumPhysicalGPUs: OK
NvAPI_EnumLogicalGPUs: OK
NvAPI_EnumTCCPhysicalGPUs: OK
NvAPI_GPU_GetBusId: OK
NvAPI_EnumNvidiaDisplayHandle 0: OK
NvAPI_GetPhysicalGPUsFromDisplay: OK
NvAPI_EnumNvidiaDisplayHandle 1: End enumeration
NvAPI_GetPhysicalGPUsFromLogicalGPU: OK
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98db0, 17, 0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98dbc, 20, 0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98d84, 13, 0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98e88, 36, 0)
0530:trace:nvcuda:wine_cuDeviceGetAttribute (0x14d98ec4, 87, 0)
0530:trace:nvcuda:wine_cuCtxSetCurrent (0x7e26d530)
0530:trace:nvcuda:ContextStorage_Get (0x2f60f2c0, (nil), 0x14da57e0)
0530:trace:nvcuda:wine_cuCtxGetCurrent (0x2f60f250)
0530:trace:nvcuda:wine_cuDevicePrimaryCtxRetain (0x2f60f238, 0)
0530:trace:nvcuda:ContextStorage_Get (0x2f60f220, (nil), 0x14da57e0)
0530:trace:nvcuda:wine_cuCtxGetCurrent (0x2f60f230)
0530:trace:nvcuda:wine_cuCtxGetDevice (0x2f60f228)
0530:trace:nvcuda:ContextStorage_Set ((nil), 0x14da57e0, 0x14da9e20, 0x14d37eb0)
0530:trace:seh:dispatch_exception code=c0000005 flags=0 addr=0000000000000000 ip=0000000000000000 tid=0530
0530:warn:seh:dispatch_exception EXCEPTION_ACCESS_VIOLATION exception (code=c0000005) raised
0570:trace:seh:dispatch_exception  rax=0000000014d90380 rbx=0000000000000000 rcx=000000002f60f220 rdx=0000000021ae3198
0570:trace:seh:dispatch_exception  rsi=0000000000000000 rdi=0000000021ae3198 rbp=0000000000000000 rsp=000000002f60f188
0570:trace:seh:dispatch_exception   r8=0000000000000000  r9=0000000000000000 r10=00007ff072e483e0 r11=0000000014db0000
0570:trace:seh:dispatch_exception  r12=0000000014da6690 r13=000000002f60f250 r14=0000000014da9e20 r15=0000000000000000

jp7677 commented 2 years ago

Could you please try this?

int wine_luid[] = { 0x000003f0, 0x00000000 };
memcpy(luid, &wine_luid, sizeof(wine_luid));

This should return the same LUID that DXVK-NVAPI returned according to your former posting. Well, assuming that I didn't screwed up completely ;) Please also validate first that your LUID is still the same.

I don't think it will solve the crash that happens later, but worth a try.

PetitMote commented 2 years ago

I'll try that as soon as I can tomorrow, for now, going to sleep ^^

SveSop commented 2 years ago

Will give these tests a whirl when i get home. It is possible to write cuda calls in a simple program using cuda SDK using the cudacompiler (NVCC) maybe? I will take a look at that too, and see if using cuDeviceGetLuid in a linux native cuda app can get a different result.. Edit: https://github.com/NVIDIA/cuda-samples/blob/master/Samples/simplePrintf/simplePrintf.cu Probably easy to retro fit this to print LUID from native libcuda for verification.

I am sure i remember reading someplace that if not a valid UUID/LUID (or something) was returned, it would be set to 0xFFFFFFFF, and that seems to be the number in front of my GPU when starting DAZ, but i did not pick up on it at the start because it was typed as decimal... so it just looked like a "big weird number" (when viewing in the DAZ settings)

PetitMote commented 2 years ago

Hello!

It seems to work fine:

04bc:trace:nvcuda:wine_cuDeviceGetLuid (0x14518d38, 0x14518d40, 0)
04bc:trace:nvcuda:Unknown2_func1_relay (0x145103d0, 0x101eec0)
04bc:trace:nvcuda:Unknown2_func5_relay (0x145103d8, 0x101eec8)
04bc:trace:nvcuda:wine_cuGetExportTable (0x145103c8, 0x144e53b0)
04bc:trace:nvcuda:wine_cuGetExportTable (0x101ee98, 0x144e7640)
04bc:trace:nvcuda:wine_cuDeviceGetAttribute (0x14518db0, 17, 0)
04bc:trace:nvcuda:wine_cuDeviceGetAttribute (0x14518dbc, 20, 0)
04bc:trace:nvcuda:wine_cuDeviceGetAttribute (0x14518d84, 13, 0)
04bc:trace:nvcuda:wine_cuDeviceGetAttribute (0x14518e88, 36, 0)
04bc:trace:nvcuda:wine_cuDeviceGetAttribute (0x14518ec4, 87, 0)
053c:trace:nvcuda:DllMain (0x7efeb4a60000, 2, (nil))
# This is where nvcuda starts to fail to answer, there is a lot of these #
053c:trace:nvcuda:wine_cuCtxGetDevice (0x2ed8f228)
053c:trace:nvcuda:ContextStorage_Set ((nil), 0x145257e0, 0x14529e20, 0x144b7eb0)
053c:trace:seh:dispatch_exception code=c0000005 flags=0 addr=0000000000000000 ip=0000000000000000 tid=053c
053c:warn:seh:dispatch_exception EXCEPTION_ACCESS_VIOLATION exception (code=c0000005) raised
053c:trace:seh:dispatch_exception  rax=0000000014510380 rbx=0000000000000000 rcx=000000002ed8f220 rdx=00000000212631b0
053c:trace:seh:dispatch_exception  rsi=0000000000000000 rdi=00000000212631b0 rbp=0000000000000000 rsp=000000002ed8f188
053c:trace:seh:dispatch_exception   r8=0000000000000000  r9=0000000000000000 r10=00007efeb4aa33e0 r11=0000000014530000
053c:trace:seh:dispatch_exception  r12=0000000014526740 r13=000000002ed8f250 r14=0000000014529e20 r15=0000000000000000

And back with the error. Problem is, the ContextStorage_Set doesn’t seem to be an nvidia function. I think it’s been coded by wine wizards. Could be linked to the cuGetExportTablefunction, but this one isn’t documented.

EDIT: I think the DllMain marks an exception in the last nvcuda call, therefore we should investigate the:

04bc:trace:nvcuda:wine_cuDeviceGetAttribute (0x14518ec4, 87, 0)

It should return the CU_DEVICE_ATTRIBUTE_SINGLE_TO_DOUBLE_PRECISION_PERF_RATIO https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#group__CUDA__TYPES_1ge12b8a782bebe21b1ac0091bf9f4e2a3 I’ll try logging the value. First, I’ll send a patch with my modifications, so we can test at the same level.

Here is the patch: nvcuda-DazStudio.patch.txt

jp7677 commented 2 years ago

You can find ContextStorage_Set here: https://github.com/SveSop/nvidia-libs/blob/master/dlls/nvcuda/internal.c#L335 Now it goes deep into the wine-cuda internals (with lots of unknown things according to the source), Unfortunately I'm really not familiar with that part (or with any part of wine ;)).

PetitMote commented 2 years ago

You can find ContextStorage_Set here: https://github.com/SveSop/nvidia-libs/blob/master/dlls/nvcuda/internal.c#L335 Now it goes deep into the wine-cuda internals (with lots of unknown things according to the source), Unfortunately I'm really not familiar with that part (or with any part of wine ;)).

Yep, I’ve seen it, but it’s not a cuda function, therefore there is absolutely no documentation, and I think the guy who did this must have been extremely smart :sweat_smile:

jp7677 commented 2 years ago

Could you run a test with DXVK-NVAPI from master? Im curious if any of the added methods on the branch actually matter? I'm probably fine with adding NvAPI_EnumTCCPhysicalGPUs and NvAPI_GetAssociatedNvidiaDisplayName if that helps to display the GPU correctly in the logs, but would love to avoid the other three?

SveSop commented 2 years ago

I would think NvAPI_GPU_GetBusType, NvAPI_GPU_GetBusSlotId and NvAPI_GPU_GetCurrentPCIEDownstreamWidth cant really be necessary, but will test this.

PetitMote commented 2 years ago

Could you run a test with DXVK-NVAPI from master? Im curious if any of the added methods on the branch actually matter? I'm probably fine with adding NvAPI_EnumTCCPhysicalGPUs and NvAPI_GetAssociatedNvidiaDisplayName if that helps to display the GPU correctly in the logs, but would love to avoid the other three?

It went as far as with your previous fixes, so I guess they weren’t necessary after all dxvk-nvapi.log

EDIT: They might be necessary for other functions, however. I’m thinking about OpenCL physic simulation, that I might try to fix after this.

PetitMote commented 2 years ago

Also, I put some log on the attribute value, but it doesn’t look like there is a problem:

04d4:trace:nvcuda:wine_cuDeviceGetAttribute (0x14518ec4, 87, 0)
04d4:trace:nvcuda:wine_cuDeviceGetAttribute Attribute value: 32
0558:trace:nvcuda:DllMain (0x7f0b282f0000, 2, (nil))
0558:trace:nvcuda:cuda_process_tls_callbacks (2)
055c:trace:nvcuda:DllMain (0x7f0b282f0000, 2, (nil))
055c:trace:nvcuda:cuda_process_tls_callbacks (2)

CUresult WINAPI wine_cuDeviceGetAttribute(int *pi, CUdevice_attribute attrib, CUdevice dev)
{
    TRACE("(%p, %d, %d)\n", pi, attrib, dev);
    auto error = pcuDeviceGetAttribute(pi, attrib, dev);
    TRACE("Attribute value: %d\n", *pi);
    return error;
}

PetitMote commented 2 years ago

Wait, I do have errors about NvAPI implementations missing, all concerning the Iray renderer. I’ll keep testing with your fixes, and tests without at the end.

SveSop commented 2 years ago

Hmpf.. there might be a problem.. When i tried to make a cuda proggy get LUID from native libcuda i get : CUDA_ERROR_NOT_SUPPORTED

  Adapter LUID Error: 801
  Adapter UUID: 0x00000049

Running the same sample compiled and run in windows 10:

  Adapter LUID: 14
  Adapter node mask: 1
  Adapter UUID: 0x0000006d

Running the .exe in my DAZ prefix:

0100:trace:nvcuda:wine_cuDeviceGetLuid (0x11fd18, 0x11fd3c, 0)
  Adapter LUID Error: 801
0100:trace:nvcuda:wine_cuDeviceGetUuid (0x11fd60, 0)
  Adapter UUID: 0x00000049

So.. not sure if LUID is not supported for ME, or if something else is up? Attaching two binaries linux and windows, and you can try? Source file is also included in case you want to compile yourself.

PS. The printf of the LUID/UUID info may be wrong, cos i am always messing up when printf'ing from pointers and whatnot... cuda_sample.tar.gz

SveSop commented 2 years ago

Wait, I do have errors about NvAPI implementations missing, all concerning the Iray renderer. I’ll keep testing with your fixes, and tests without at the end.

It is the Iray renderer that uses the NvAPI commands - so naturally it will also be the one complaining if we do NOT use eg. NvAPI_GPU_GetBusType

I cannot really see why DAZ would NEED to know if you are running on pci/agp/pci-e, but for all i know (since we dont have the sourcecode) it CAN be a call saying "If using PCI card, this computer is too crappy -> FAIL" :smiling_imp:

PetitMote commented 2 years ago

@SveSop I got the same error on Linux. I am too tired of this to recompile the nvcuda without the LUID fix :sweat_smile: It really seems this a windows only capability.

Wait, I do have errors about NvAPI implementations missing, all concerning the Iray renderer. I’ll keep testing with your fixes, and tests without at the end.

It is the Iray renderer that uses the NvAPI commands - so naturally it will also be the one complaining if we do NOT use eg. NvAPI_GPU_GetBusType

I cannot really see why DAZ would NEED to know if you are running on pci/agp/pci-e, but for all i know (since we dont have the sourcecode) it CAN be a call saying "If using PCI card, this computer is too crappy -> FAIL" smiling_imp

Why is there 753 way of doing the same thing with the same goddamn driver? Nvidia, you’re killing me! Also, could you please complete your doc, by the way? Would appreciate that, thank you nvidia :smile:

jp7677 commented 2 years ago

@SveSop I got the same error on Linux. I am too tired of this to recompile the nvcuda without the LUID fix 😅 It really seems this a windows only capability.

Yes correct. Running vulkaninfo natively on Linux also shows that a Luid is not available, so it is expected that cuda natively on Linux also returns no Luid. I don’t know if this is really just a windows only feature or just that no driver developer bothered to implement it. Anyway, wine takes care that D3D and vulkan within wine do have a Luid available. The same needs to be done for cuda. Wine-cuda needs to get the Luid from wine internally, instead of forwarding this call to the native cuda library.

PS, I learned this yesterday evening :)

PetitMote commented 2 years ago

@SveSop I got the same error on Linux. I am too tired of this to recompile the nvcuda without the LUID fix sweat_smile It really seems this a windows only capability.

Yes correct. Running vulkaninfo natively on Linux also shows that a Luid is not available, so it is expected that cuda natively on Linux also returns no Luid. I don’t know if this is really just a windows only feature or just that no driver developer bothered to implement it. Anyway, wine takes care that D3D and vulkan within wine do have a Luid available. The same needs to be done for cuda. Wine-cuda needs to get the Luid from wine internally, instead of forwarding this call to the native cuda library.

PS, I learned this yesterday evening :)

Where do you find this knowledge? Tell me!

Is your fix the way to do it? Or should we call to a wine API? (I think they are in kernel32?)

jp7677 commented 2 years ago

No, my fix was just a band-aid for troubleshooting. The luid needs to be get from wine. Something like this https://github.com/wine-mirror/wine/commit/8007d19c2792b5b177bd7200dc3567df4677dc0c (Though this code has been moved now to somewhere else)

PetitMote commented 2 years ago

Will need to have a look at this.

I’ve been looking at the logs. We get an error right after a call to ContextStorage_Set which is called with an empty pointer that should address a Cuda Context.

static CUresult WINAPI ContextStorage_Set(CUcontext ctx, void *key, void *value, void *callback)
{
    struct context_storage *storage;
    CUresult ret;

    TRACE("(%p, %p, %p, %p)\n", ctx, key, value, callback);

//    if (!ctx)
//    {
//        CUcontext* pctx;
//        pctx = &ctx;
//        TRACE("No CUcontext, trying to get CurrentCtx :");
//        wine_cuCtxGetCurrent(pctx);
//    }
    TRACE("%d / %p\n", ctx, ctx);

    storage = HeapAlloc( GetProcessHeap(), 0, sizeof(*storage) );
    if (!storage)
        return CUDA_ERROR_OUT_OF_MEMORY;

    storage->callback = callback;
    storage->value = value;
    ret = ContextStorage_orig->Set(ctx, key, storage, storage_destructor_callback);
    if (ret) HeapFree( GetProcessHeap(), 0, storage );
    return ret;
}

This function is called trough:

struct ContextStorage_table ContextStorage_Impl =
{
    ContextStorage_Set,
    ContextStorage_Remove,
    ContextStorage_Get,
};

Which in turn is called by cuda_get_table:

CUresult cuda_get_table(const void **table, const CUuuid *uuid, const void *orig_table, CUresult orig_result)
{
    char buffer[128];

    if (cuda_equal_uuid(uuid, &UUID_Unknown1))
    {
        if (orig_result)
            return orig_result;
        if (!cuda_check_table(orig_table, (void *)&Unknown1_Impl, "Unknown1"))
            return CUDA_ERROR_UNKNOWN;

        Unknown1_orig = orig_table;
        *table = (void *)&Unknown1_Impl;
        return CUDA_SUCCESS;
    }
    else if (cuda_equal_uuid(uuid, &UUID_Unknown2))
    {
        if (orig_result)
            return orig_result;
        if (!cuda_check_table(orig_table, (void *)&Unknown2_Impl, "Unknown2"))
            return CUDA_ERROR_UNKNOWN;

        Unknown2_orig = orig_table;
        *table = (void *)&Unknown2_Impl;
        return CUDA_SUCCESS;
    }
    else if (cuda_equal_uuid(uuid, &UUID_Unknown3))
    {
        if (orig_result)
            return orig_result;
        if (!cuda_check_table(orig_table, (void *)&Unknown3_Impl, "Unknown3"))
            return CUDA_ERROR_UNKNOWN;

        Unknown3_orig = orig_table;
        *table = (void *)&Unknown3_Impl;
        return CUDA_SUCCESS;
    }
    else if (cuda_equal_uuid(uuid, &UUID_ContextStorage))
    {
        if (orig_result)
            return orig_result;
        if (!orig_table)
            return CUDA_ERROR_UNKNOWN;

        ContextStorage_orig = orig_table;
        *table = (void *)&ContextStorage_Impl;
        return CUDA_SUCCESS;
    }
    else if (cuda_equal_uuid(uuid, &UUID_Unknown5))
    {
        if (orig_result)
            return orig_result;
        if (!cuda_check_table(orig_table, (void *)&Unknown5_Impl, "Unknown5"))
            return CUDA_ERROR_UNKNOWN;

        Unknown5_orig = orig_table;
        *table = (void *)&Unknown5_Impl;
        return CUDA_SUCCESS;
    }
    else if (cuda_equal_uuid(uuid, &UUID_TlsNotifyInterface))
    {
        /* the following interface is not implemented in the Linux
         * CUDA driver, we provide a replacement implementation */
        *table = (void *)&TlsNotifyInterface_Impl;
        return CUDA_SUCCESS;
    }

    FIXME("Unknown UUID: %s, error: %d\n", cuda_print_uuid(uuid, buffer, sizeof(buffer)), orig_result);
    return CUDA_ERROR_UNKNOWN;
}

And then we arrive at cuGetExportTable, which is an undocumented nvidia function introduced in Cuda 3.0 https://forums.developer.nvidia.com/t/cudagetexporttable-a-total-hack/20226

CUresult WINAPI wine_cuGetExportTable(const void **table, const CUuuid *id)
{
    const void* orig_table = NULL;
    CUresult ret;

    TRACE("(%p, %p)\n", table, id);

    ret = pcuGetExportTable(&orig_table, id);
    return cuda_get_table(table, id, orig_table, ret);
}

So I can think there would be an error in either the cuGetExportTable or in the cuda_get_table function?

EDIT: I tried only passing the cuGetExportTable to linux native Cuda, it didn’t work and put me back to some steps before.

SveSop commented 2 years ago

// from https://github.com/jp7677/dxvk-nvapi/blob/master/tests/nvapi_sysinfo.cpp#L448
//  LUID high part:    0x00000000
//  LUID low part:    0x000003f0
luid = "f3000000";

Edit: I'm validating this, give me a minute.

Is adapter LUID just two DWORDs? I mean.. UUID seems a lot more involved... Adapter UUID: 49e16335-552c-8128-77c9-81cbe8aed6bf Why is not the LUID also a similar value?

If it is just supposed to be two DWORDs, how can i go about printing that to output so it looks correct? It is defined as char *luid and printing the hex value of luid gives just the low part it seems.. so i need to "split" this value i guess.. Arf.. if i only knew how to program :open_mouth:

Saancreed commented 2 years ago

@SveSop Fwiw this is how vulkaninfo.exe prints info about my GPUs when run in Wine:

$ WINEDEBUG=-all wine vulkaninfo.exe | grep --after=6 'VkPhysicalDeviceIDProperties:'
fsync: up and running.
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
VkPhysicalDeviceIDProperties:
-----------------------------
        deviceUUID      = 00000000-0500-0000-0000-000000000000
        driverUUID      = 414d442d-4c49-4e55-582d-445256000000
        deviceLUID      = f4030000-00000000
        deviceNodeMask  = 1
        deviceLUIDValid = true
--
VkPhysicalDeviceIDProperties:
-----------------------------
        deviceUUID      = 4e121e43-0e61-b00f-9289-bc34ad6b331e
        driverUUID      = 77a1b102-7c87-5864-beec-846253941470
        deviceLUID      = f6030000-00000000
        deviceNodeMask  = 1
        deviceLUIDValid = true
--
VkPhysicalDeviceIDProperties:
-----------------------------
        deviceUUID      = 00000000-0500-0000-0000-000000000000
        driverUUID      = 414d442d-4c49-4e55-582d-445256000000
        deviceLUID      = f4030000-00000000
        deviceNodeMask  = 1
        deviceLUIDValid = true
--
VkPhysicalDeviceIDProperties:
-----------------------------
        deviceUUID      = 00000000-0000-0000-0000-000000000000
        driverUUID      = 00000000-0000-0000-0000-000000000000
        deviceLUID      = f1030000-00000000
        deviceNodeMask  = 1
        deviceLUIDValid = true
--
VkPhysicalDeviceIDProperties:
-----------------------------
        deviceUUID      = 00000000-0500-0000-0000-000000000000
        driverUUID      = 414d442d-4d45-5341-2d44-525600000000
        deviceLUID      = f4030000-00000000
        deviceNodeMask  = 1
        deviceLUIDValid = true

That deviceNodeMask could be wrong though, but LUIDs are formatted like pairs of 32-bit numbers so ~~maybe what cuDeviceGetLuid returns is also a string formatted like that?~~ the char *luid pointer is actually not a string but a binary structure.

Given a Windows machine where vulkaninfo.exe reports LUID as c8b40000-00000000 the following code snippet:

#include <cuda.h>
#include <stdio.h>
#include <winnt.h>

…

unsigned int nodeMask;
LUID luid = {0, 0};

cuDeviceGetLuid((char*)&luid, &nodeMask, device);

printf("LUID: %08lx-%08lx\n", luid.LowPart, luid.HighPart);

prints LUID: 0000b4c8-00000000, which is kind of what we want but with the order of bytes reversed.

jp7677 commented 2 years ago

prints LUID: 0000b4c8-00000000, which is kind of what we want but with the order of bytes reversed.

@Saancreed Thanks for validating what Windows returns. The reversed order is correct here with your code. Looking at the source of vulkaninfo ( https://github.com/KhronosGroup/Vulkan-Tools/blob/b50a0f786efc0b70ae05a9b1e1cc0526e43ee7d1/vulkaninfo/outputprinter.h#L45 ), it prints the LUID byte by byte. Formatting the LUID with %08lx (or std::hexin nvapi-tests64) for the lowPart/highpart (thus the four bytes combined as integer) gives you a hex representation of an integer with reversed bytes.

I should probably add the vulkan formatting to the DXVK-NVAPI tests to avoid confusing me and others ;)

Edit: Fun fact: the Windows structure LUID contains DWORD and LONG types for highPart/lowPart, both are 32bit/4bytes, apparently this is history from before x64 Windows. Nowadays a long type is 8 bytes long. This got me confused while playing with sizeof(LUID) and actually expecting a size of 12 bytes instead of 8 when looking at the definition of it ;)

SveSop commented 2 years ago

I have a small theory i thought of yesterday, and that is this "unknownX" tables in nvcuda. These all is UUID's, and i think maybe it works similar to D3D interface uuid's one can see exposed in eg. dxvk (it is documented in great degree from M$ aswell).

You can see this when viewing

static const CUuuid UUID_Unknown1                   = {{0x6B, 0xD5, 0xFB, 0x6C, 0x5B, 0xF4, 0xE7, 0x4A,
                                                        0x89, 0x87, 0xD9, 0x39, 0x12, 0xFD, 0x9D, 0xF9}};

This could mean cuda interface (unknown for the time being): 6BD5FB6C-5BF4-E74A-8987-D93912FD9DF9 being the uuid of a "cuda interface" of sorts... Ofc nothing comes up in searches.

As a comparison D3D11 interfaces is shown in dxvk source : src/d3d11/d3d11_interfaces.h

Maybe cuda (10/11) got a new interface that it tries to fetch, and it faults cos it is non existant and nvcuda does not really provide any sort of fallback (as we can see when exposed stub's just cause it to crash)?

EDIT: Well, i spoke too soon, as it do provide a fallback for unknown UUID's:

    FIXME("Unknown UUID: %s, error: %d\n", cuda_print_uuid(uuid, buffer, sizeof(buffer)), orig_result);
    return CUDA_ERROR_UNKNOWN;

PetitMote commented 2 years ago

Hello,

Do you think the LUID could still be the cause of an error? As we get an access violation, that doesn’t seem unliky to me. Or maybe you’re just still trying to format it right?

For the unknown functions, I don’t understand them, but I believe it might be because they just didn’t know the names of the functions.

Also, it looks to me like it’s really the ContextStorage_Set function that causes a problem, as it’s called with a null pointer that should reference a Cuda Context. But this function looks like sorcery to me.

EDIT: I thought about trying, and changing the LUID value doesn’t stop the program, so maybe the problem is that we don’t give it a correct value, so another graphic API will try to write into the wrong memory?

SveSop commented 2 years ago

It seems we do not get any calls to "unknown" UUID's atleast for now, and i cannot really say what the LUID is used for, as it is retrieved once. It could ofc be referenced in other functions we do not know of yet..

It does not seem to cause a crash really, and it might be worth looking into the this ContextStorage_Set function although i have absolutely no clue what it is about. Fiddling with Unknown1_relay1 (dropping the param1 that is (nil)) seems to bring us back to a regular "not initialized" state without a crash tho, and nvcuda do report that the Unknown1 table's implementation is older than the drivers. Viewing the size: Driver implementation: 80 (bytes?) - Wine implementation 48 (bytes?).

This has not really been problematic with the few (rather old) cuda/physx tests i've done with this, and there is no "new" calls here it seems. Will look more into this, but i have been looking at it a while back aswell without being any wiser about this wizardry as you rightfully put it :smile:

SveSop commented 2 years ago

Oh.. Adding stuff to Unknown1 table brought some new stuff for sure!

2022-01-08 14:29:43.126 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 2070): WDDM driver used, consider switching to TCC driver model if no display needed (via 'nvidia-smi -dm 1'), to increase rendering performance (if no DX/VK needed)

2022-01-08 14:29:43.126 NVIDIA Iray GPUs:
2022-01-08 14:29:43.126     GPU: 1 - NVIDIA GeForce RTX 2070
2022-01-08 14:29:43.126     Memory Size: 8.2 GB
2022-01-08 14:29:43.126     Clock Rate: 1815000 kHz
2022-01-08 14:29:43.126     Multi Processor Count: 36
2022-01-08 14:29:43.126     CUDA Device ID: 0
2022-01-08 14:29:43.126     CUDA Compute Capability: 7.5
2022-01-08 14:29:43.126     PCI Bus ID: 1
2022-01-08 14:29:43.126     PCI Device ID: 0
2022-01-08 14:29:43.127     TCC Mode: disabled
2022-01-08 14:29:43.127     Display: attached
2022-01-08 14:29:43.127 NVIDIA Iray Scheduling Configuration:
2022-01-08 14:29:43.127     CPU Load Limit: 12
2022-01-08 14:29:43.127     CPU Thread Affinity: disabled
2022-01-08 14:29:43.127     GPU Load Limit: 1
2022-01-08 14:29:43.303 Total class factories: 2078

PetitMote commented 2 years ago

For the LUID, I got this code in the winevulkan dll. So I think the best would be to call it directly, although I have strictly no idea of how you do it in C:

static void fill_luid_property(VkPhysicalDeviceProperties2 *properties2)
{
    VkPhysicalDeviceIDProperties *id;
    SP_DEVINFO_DATA device_data;
    DWORD type, device_idx = 0;
    HDEVINFO devinfo;
    HANDLE mutex;
    GUID uuid;
    LUID luid;

    if (!(id = wine_vk_find_struct(properties2, PHYSICAL_DEVICE_ID_PROPERTIES)))
        return;

    wait_graphics_driver_ready();
    mutex = get_display_device_init_mutex();
    devinfo = SetupDiGetClassDevsW(&GUID_DEVCLASS_DISPLAY, L"PCI", NULL, 0);
    device_data.cbSize = sizeof(device_data);
    while (SetupDiEnumDeviceInfo(devinfo, device_idx++, &device_data))
    {
        if (!SetupDiGetDevicePropertyW(devinfo, &device_data, &WINE_DEVPROPKEY_GPU_VULKAN_UUID,
                &type, (BYTE *)&uuid, sizeof(uuid), NULL, 0))
            continue;

        if (!IsEqualGUID(&uuid, id->deviceUUID))
            continue;

        if (SetupDiGetDevicePropertyW(devinfo, &device_data, &DEVPROPKEY_GPU_LUID, &type,
                (BYTE *)&luid, sizeof(luid), NULL, 0))
        {
            memcpy(&id->deviceLUID, &luid, sizeof(id->deviceLUID));
            id->deviceLUIDValid = VK_TRUE;
            id->deviceNodeMask = 1;
            break;
        }
    }
    SetupDiDestroyDeviceInfoList(devinfo);
    release_display_device_init_mutex(mutex);

    TRACE("deviceName:%s deviceLUIDValid:%d LUID:%08x:%08x deviceNodeMask:%#x.\n",
            properties2->properties.deviceName, id->deviceLUIDValid, luid.HighPart, luid.LowPart,
            id->deviceNodeMask);
}

@SveSop Hey, I’ve juste seen that we didn’t get to this part since we corrected the LUID! What did you change? Can you share a patch?

SveSop commented 2 years ago

LUID had nothing to do with it, as i just inserted a random number, but ended in CUDA_SUCCESS (faking a OK like we did earlier), and it seems to fail initialization if the LUID call fails.. In other words - DAZ needs cuDeviceGetLuid to return CUDA_SUCCESS to consider initializing the engine. If the LUID is actually used for anything, i do not know.

I will tidy up my patch, and post it here shortly. Wine-staging-7.0-rc5 is out today with the added stub's, so that will be the "starting point" so to speak.

PS. It still does not actually render tho.. just so you dont believe its all fixed...

SveSop commented 2 years ago

Oki. 2 patches in addition to the latest staging-7.0-rc5 patch posted https://github.com/wine-staging/wine-staging/commit/3b01c6e2c5c26498b0f261fabbc4cd51ec917a2e

@jp7677 I also tested the patches from https://github.com/jp7677/dxvk-nvapi/pull/65 and it seems as if i drop any of them, DAZ will show 2 cuda adapters - 1 with WDDM display, and the other with TCC or whatever it was...

Not sure why that is, but feel free to test if any of the patches from https://github.com/jp7677/dxvk-nvapi/pull/65 can be dropped cudapatches.tar.gz .

Saancreed commented 2 years ago

I think we can alter Wine's nvcuda.dll to return LUIDs from winevulkan. Naïve implementation would be like this:

#include <cuda.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <vulkan/vulkan.h>

CUresult WINAPI wine_cuDeviceGetLuid(char *luid, unsigned int *deviceNodeMask, CUdevice dev)
{
    CUuuid uuid;
    CUresult result;

    if ((result = pcuDeviceGetUuid(&uuid, dev)) != CUDA_SUCCESS)
    {
        return result;
    }

    result = CUDA_ERROR_NOT_SUPPORTED;

    VkApplicationInfo vkApplicationInfo = {
        .sType = VK_STRUCTURE_TYPE_APPLICATION_INFO,
        .pNext = NULL,
        .pApplicationName = NULL,
        .applicationVersion = 0,
        .pEngineName = NULL,
        .engineVersion = 0,
        .apiVersion = VK_API_VERSION_1_1,
    };

    VkInstanceCreateInfo vkInstanceCreateInfo = {
        .sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO,
        .pNext = NULL,
        .flags = 0,
        .pApplicationInfo = &vkApplicationInfo,
        .enabledLayerCount = 0,
        .ppEnabledLayerNames = NULL,
        .enabledExtensionCount = 0,
        .ppEnabledExtensionNames = NULL,
    };

    VkInstance vkInstance;
    VkResult vkResult;

    if ((vkResult = vkCreateInstance(&vkInstanceCreateInfo, NULL, &vkInstance)) != VK_SUCCESS)
    {
        goto vkerror;
    }

    uint32_t vkPhysicalDeviceCount;

    if ((vkResult = vkEnumeratePhysicalDevices(vkInstance, &vkPhysicalDeviceCount, NULL)) != VK_SUCCESS)
    {
        goto vkerror_instance;
    }

    VkPhysicalDevice *vkPhysicalDevices = calloc(vkPhysicalDeviceCount, sizeof(VkPhysicalDevice));

    if (!vkPhysicalDevices)
    {
        result = CUDA_ERROR_OUT_OF_MEMORY;
        goto vkerror_instance;
    }

    if ((vkResult = vkEnumeratePhysicalDevices(vkInstance, &vkPhysicalDeviceCount, vkPhysicalDevices)) != VK_SUCCESS)
    {
        goto vkerror_devices;
    }

    for (uint32_t i = 0; i < vkPhysicalDeviceCount; ++i)
    {
        VkPhysicalDeviceIDProperties vkPhysicalDeviceIDProperties = {
            .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES,
            .pNext = NULL,
        };

        VkPhysicalDeviceProperties2 vkPhysicalDeviceProperties2 = {
            .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2,
            .pNext = &vkPhysicalDeviceIDProperties,
        };

        vkGetPhysicalDeviceProperties2(vkPhysicalDevices[i], &vkPhysicalDeviceProperties2);

        if (vkPhysicalDeviceIDProperties.deviceLUIDValid == VK_TRUE &&
            !memcmp(vkPhysicalDeviceIDProperties.deviceUUID, &uuid, VK_UUID_SIZE))
        {
            memmove(luid, vkPhysicalDeviceIDProperties.deviceLUID, VK_LUID_SIZE);
            *deviceNodeMask = vkPhysicalDeviceIDProperties.deviceNodeMask;
            result = CUDA_SUCCESS;
            break;
        }
    }

    vkerror_devices: free(vkPhysicalDevices);
    vkerror_instance: vkDestroyInstance(vkInstance, NULL);
    vkerror: return result;
}

Might require some minor modifications but maybe it'll work.

(PRIME setups/laptops might need https://github.com/ValveSoftware/wine/commit/0d00e4fc11f9768447f0544496e316edf225de84 as well.)

SveSop commented 2 years ago

This is something perhaps:

2022-01-08 15:17:59.411 WARNING: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(359): Iray [ERROR] - IRAY:RENDER ::   1.0   IRAY   rend error: optixInit() failed: Library not found. Please update your NVIDIA driver (www.nvidia.com).
2022-01-08 15:17:59.411 WARNING: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(359): Iray [WARNING] - IRAY:RENDER ::   1.0   IRAY   rend warn : CUDA device 0 (NVIDIA GeForce RTX 2070) is no longer available for rendering.
2022-01-08 15:17:59.411 WARNING: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(359): Iray [WARNING] - IRAY:RENDER ::   1.0   IRAY   rend warn : CUDA device 0 (NVIDIA GeForce RTX 2070) is no longer available for rendering.
2022-01-08 15:17:59.411 WARNING: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(359): Iray [WARNING] - IRAY:RENDER ::   1.0   IRAY   rend warn : All available GPUs failed.

What is optixInit?

Hmm: https://forums.developer.nvidia.com/t/optix-error-failed-to-load-optix-library/70671/14

Raytracing library? Can RT be disabled by some setting in DAZ?

PetitMote commented 2 years ago

Looks like this: https://developer.nvidia.com/blog/how-to-get-started-with-optix-7/ And could be used for raytracing

Following your edit: I’m not sure, raytracing may be needed for Iray? But it needs only a compute capability of 2.0.

Saancreed commented 2 years ago

So, is it time to create another Wine library to forward calls to a native library, that one being libnvoptix.so this time? Quite unfortunate but it seems that this particular SDK requires membership of NVIDIA Developer Program :disappointed:

SveSop commented 2 years ago

Since my GTX970 does not have raytracing, i wonder if it is loaded at all there and just not used - or if the Iray engine is "smart" enough to figure out that it is not a RT card? Tried to spoof a 970 card in dxgi.conf, but still showed RTX2070 in DAZ

SveSop commented 2 years ago

I mean, if we could get GPU rendering without RT on cuda it would be a step up from CPU without RT anyway :smile:

Saancreed commented 2 years ago

Since my GTX970 does not have raytracing, i wonder if it is loaded at all there and just not used - or if the Iray engine is "smart" enough to figure out that it is not a RT card?

Well, it appears to use Optix, and Optix claims to support Maxwell and Pascal GPUs too.

Tried to spoof a 970 card in dxgi.conf, but still showed RTX2070 in DAZ

It might be calling CUDA to determine the GPU's name.

PetitMote commented 2 years ago

@SveSop you’re a goddamn genius, how did you get the idea of expanding Unknown1 ? Fixed the crash, at least.

PetitMote commented 2 years ago

Me again! So, we just made a big step: render launches and doesn’t crash. However, it fallbacks to CPU.

SveSop commented 2 years ago

As i briefly mentioned earlier, i had been looking into this in the past, but what i did not have then was anything actually using these "new" tables, so was not able to test anything. Glad it seemed to work.

I guess working with DAZ may be gpu accelerated now, and it is the renderer that will fall back to cpu due to this optix lib not being avalable? Is there some scene where you can check to see if there is a difference in GPU load when you manipulate it and switch between CPU and GPU in the settings? (Will have some GPU load no matter what due to stuff being drawn i guess...)

Atleast the logs seems to indicate some more cuda stuff flying by

PetitMote commented 2 years ago

I don’t think it will change anything with the viewport, as I don’t think it was ever using cuda.

I was trying to do as you asked, and I get another access violation. Maybe we should continue expanding the unknown tables?

I didn’t find any option to disable Ray Tracing.

PetitMote commented 2 years ago

Well, is there is any need for Optix, I found the reference for the API… https://raytracing-docs.nvidia.com/optix7/api/index.html

I would gladly start working on it right now, but I think I’m capable of only doing the boring stuff of passing the calls to the native library :sweat_smile:

jp7677 / dxvk-nvapi

Daz Studio lacks some NvAPI methods for iRay rendering #64