microsoft / MixedReality-HolographicRemoting-Samples

Code samples for using Microsoft's Holographic Remoting library.
Other
141 stars 47 forks source link

Holographic remote player lose rendering #81

Closed almoga296 closed 6 months ago

almoga296 commented 1 year ago

Describe the bug When performing GPU heavy task, frequently the player (Hololens2) lose it rendering while the remote side (PC) still render the running app. Along the lose of rendering there is deep in the NVENC (NVIDIA encoder) and suddenly the GPU start perform close to 100%. image attached

Note that the application still running (continuous sending the streams (20Mbps) + responsiveness - moving the head changes the hologram perspective)

Can't see anything interesting in the application logs. (remote side)

Disconnect then connect to holographic remote get the streams back (while the application still running)

To Reproduce Steps to reproduce the behavior:

  1. Launch remoting player on the HoloLens
  2. Running unity3d based application (remote side)
  3. Connect from remote side application to player.
  4. Perform GPU intensive task

Remote side (your Windows PC):

Player side (e.g. your HoloLens 2)

Do you have any idea what can cause that kind of problem? Have you ever faced such a problem before? Any kind of work around, or direction to understand the root cause of the problem will be welcomed.

Thanks, Almog

image

lappelsmeier commented 1 year ago

Hi @almoga296,

what is the performance of your app looking like? What does the Unity profiler say? Also our player side diagnostics would be interesting when this happens. To see them say "Enable Diagnostics" while the player is open. It'd be great if you could report the numbers shown there.

ikto-art commented 1 year ago

I just discovered today the same problem, having our Holographic Remoting Player automatically updated to 2.9.0 on our HoloLens 2 devices. I don't know if it's related to GPU intensive rendering, but for me it starts soon after rendering some content with Unity. No log or anything in Unity when it happens. I can see for a minute or two the content, and then it stops rendering in the headset but Unity still renders everything like it should. However our Unity project is still using the com.microsoft.mixedreality.openxr package version 1.6.0 that was working perfectly fine with the 2.8.1 Player app.

Edit: I upgraded com.microsoft.mixedreality.openxr to version 1.7.0 and tried installing the latest Windows Mixed Reality OpenXR runtime from the Microsoft Store but the logs from Unity states that the remoting plugin did an override to switch to the DLL runtime from the Unity package which seems to be the same version as before (2.8.1). Nothing happens in the Editor.log file when the HoloLens 2 stops rendering, however I'm under the impression that the threshold for the bug to happen is a bit higher than before (I can render more things for a longer time before it goes black) but I did not test enough to be sure.

I will try to downgrade the Player app to 2.8.1 on the HoloLens 2 to see if the issue persists.

ikto-art commented 1 year ago

I tried again on 2.8.1 version and the bug disappears.

shukenmg commented 1 year ago

Is it fixed in 2.9.1?

almoga296 commented 1 year ago

Hi, I work around it by downgrade the version of Mixed Reality OpenXR to 1.4.0. Hope they will release a version in the near future that fixes that bug.

I've open the same bug description here: https://github.com/microsoft/OpenXR-Unity-MixedReality-Samples/issues/142

ikto-art commented 1 year ago

Hello @shukenmg @almoga296,

The issue seems to be fixed with the new 1.8.0 package (com.microsoft.mixedreality.openxr-1.8.0.tgz). I only tried in a UWP build for now, with the latest 2.9.1 store version. Latest HoloLens firmware with latest updates & OpenXR runtime. Unity 2020.3.44 with MRTK2 and URP, and the package com.unity.xr.openxr is set to version 1.5.3. I tried with max bitrate 50 Mbps and 5000 Mbps to be sure and no problems so far.

Can you test on your side?

Thanks

lappelsmeier commented 1 year ago

Closing for now, let us know if you still have issues.

galdalali commented 8 months ago

This issue is still present, even in newer OpenXR versions but we managed to narrow it down.

When running a Unity application, streaming it to HoloLens2 using remote holographic and performing GPU and VRAM heavy tasks (in this case, a large AsyncGpuReadback of a 4K texture while using almost all 8gig of our graphics card's vRam) the HoloLens2 player stops rendering (literally, nothing on screen) without any error or warning.

We've even placed custom debug messages in the appremoting subsystem that hooks into the native DLL, and the DLL reports all is fine. This started in Mixed Reality OpenXR Plugin v1.4.1 and on, while version 1.4.0 worked fine. In 1.4.0 when there's a GPU surges (The AsyncGpuReadback) it looks like it about to lose rendering, similar to what happened in 1.4.1. However, it always manages to recover.

Finally, to test this, we ran with Mixed Reality OpenXR Plugin v1.4.1, which would experience this issue, and manually replaced the OpenXR remoting runtime DLL from 2.8.0 to 2.7.5. After doing this test, the issue vanished! As such, we're very confident that whatever is going wrong is happening in the Microsoft.Holographic.AppRemoting.OpenXr.dll, and the change that caused this bug appeared somewhere between v2.7.5 and v2.8.0.

@lappelsmeier, any idea what changes made in this version of the runtime can cause it?

lappelsmeier commented 8 months ago

The holographic remoting player stops reprojecting the remote frame after 0.5 seconds if no new frames arrived.

What do the statistics (https://learn.microsoft.com/en-us/windows/mixed-reality/develop/native/holographic-remoting-player#diagnostics) look like when this happens? The video frames row is the interesting one, it should ideally always be at 0/0/60 as in the documentation screenshot.

If you go down to 0 received frames or so your application doesn't submit in time anymore - we expect the remote app to run at stable 60 FPS.

galdalali commented 8 months ago

It looks like after forcing our application to stall for a second, the HoloLens stopped reprojectecting, as you suspected. However, the remoting app itself is still running and functional afterward. Moving the HoloLens changes the camera in the scene. It looks like whatever is handling recovery from a stall in the Microsoft.Holographic.AppRemoting.OpenXr.Dll isn't working. Doing this same stall does not cause the remoting app to stop receiving frames in version 2.7.5.

Below are the diagnostics after we forced Unity to freeze for about half a second. image

lappelsmeier commented 8 months ago

How do the statistics look like when you don't artificially force Unity to stall but when the problem occurs naturally?

galdalali commented 8 months ago

Same, this capture is just a couple of seconds after the unity app stall so latency build up. We have a task that we initialize after we are close to VRAM limit and acquire heavy VRAM usage (AsyncGpuReadback (4k screen capture))

Just to be clear, doing this same stall does not cause the remote app to stop receiving frames in version 2.7.5. - it just holds for a second and then recover.

lappelsmeier commented 8 months ago

@galdalali - quick update from my side: I tested today with our OpenXR Win32 remote sample (version 2.7.5, 2.8.0 and 2.9.0) in a bad network situation which led to image drop outs (and similar latency numbers as in your screenshot). All versions recovered after a bit and continued to show video after the bad network condition was resolved.

I also modified our sample to optionally sleep the thread for 2000ms in ProcessEvents() if a certain key is pressed and the player recovered every time. Also halting in debugger for a short while works.

Could you try on your end with our official sample: https://github.com/microsoft/MixedReality-HolographicRemoting-Samples/tree/main/remote_openxr/desktop ? My experiments mentioned above where done with latest with the 2.9.3 runtime connecting to the 2.9.3 player (from the store).

galdalali commented 8 months ago

Hey @lappelsmeier , I've built the holographic-remoting samples as you asked and performed the same tests. After filling the GPU memory and pausing the app, we did recover successfully. This is very odd, as this exact case in Unity will cause a disconnect.

As a sanity check, I performed the experiment in Unity and saw that we recovered in 2.7.5 but died in 2.9.3. The exact line of code that causes the death is this:

var request = AsyncGPUReadback.RequestIntoNativeArray(ref narray, resizeRT, 0, async (AsyncGPUReadbackRequest request) => { WriteToFile(narray, resizeRT, request).Forget(); });

Not the most useful data, I know. It's interesting that this only happens when : A) running in Unity B) Unity calls AsyncGPUReadback, and C) the GPU Memory is full.

Interestingly enough, I saw the same "type" of stall in the sample remoting app when calling the code above, though the sample did recover. If there's any further information I can provide to help solve this issue, please let me know, as its currently blocking me from upgrading versions for my app.

AMollis commented 8 months ago

Hi @galdalali , I'm from the MR Plugin team, and will be investigating this issue. To help with the investigation, can you please share a Unity project that produces the issue?

galdalali commented 8 months ago

Hi @AMollis,

It appears the crash/stall is caused by leaking Unity RenderTextures of large size. This was tested with normal textures (Texture2D), and the crash did not occur. The following project causes the issue, the instructions are documented in readme file.

marlenaklein-msft commented 7 months ago

Hi @galdalali, I have a couple of questions. Does this issue repro in both Unity Editor and PC remoting apps (standalone windows and UWP)? How many large leaks does it typically take before the remoting does not recover?

galdalali commented 7 months ago

Hi @marlenaklein-msft.

We've never seen this issue in the editor, only in builds (remoting builds for PC, not UWP). Typically, this issue happens when you leak enough textures to fill your GPU's VRAM. When you leak a new texture after the vram is getting full, it will fail to recover. The exact number depends on the GPU though.

galdalali commented 7 months ago

Hi @marlenaklein-msft, any update on this issue? Have you been able to reproduce it using the sample project?

marlenaklein-msft commented 7 months ago

Hi @galdalali, I was able to reproduce the issue from the sample project on PC remoting builds. @lappelsmeier I've sent you the trace logs from the repro.

galdalali commented 6 months ago

Hi @lappelsmeier, any update on this issue?

lappelsmeier commented 6 months ago

We're still investigating the issue on our side.

chairobl commented 6 months ago

Hello @galdalali, I'm happy to tell you that we have identified the cause of the problem: A timeout triggering a deadlock that will be fixed in the next HAR release :) The timeout is caused by a large GPU stall of more than 500 milliseconds, which in our testing happened in any scenarios that significantly overload the GPU. As such, you could try running your program on a stronger GPU or lowering the resolution of your screenshots in the meantime. I'll ping you once the next release has happened, and will be closing this ticket in the meantime. Feel free to reopen it if you have any follow-up questions :)

galdalali commented 2 months ago

Hi @chairobl, is there any upcoming release planned for the next HAR release? If it's too far away, is there any chance to solve it with a hotfix for version 2.9.3? I'm really stuck without the per-audio-app feature.

chairobl commented 2 months ago

Hi @galdalali, the next HAR release is currently being worked on, as is the release of a new Unity plugin version. This release will fix the deadlock situation, making HAR more resilient against strong resource degradation. Nonetheless, I would encourage you to consider the outlined suggestions in my prior comment, as such degradation is going to impact your desired user experience.

chairobl commented 1 month ago

Hi @galdalali, I'm glad to report that we've just released the new version of HAR, which fixes the deadlock issue :)

galdalali commented 1 month ago

I'm still waiting for the Unity Mixed Reality OpenXR Plugin (com.microsoft.mixedreality.openxr) to update with the new Holographic Remoting version. Any suggestion on how to update it without the MR Plugin?

chairobl commented 1 month ago

Hi @galdalali, you may do so by replacing the DLL in the package cache, as described by my colleague here 👍Make sure to account for potential differences in naming between the DLL provided by HAR and the one you intend to replace. Please feel free to ask any follow up questions if you need further clarification :)

galdalali commented 1 month ago

Hi @chairobl So, I've replaced the dll's that I've been extracted from here: https://www.nuget.org/packages/Microsoft.Holographic.Remoting.OpenXr/2.9.4 Inside build\native\bin\x64\Desktop (weights 9976Kb)

Replaced them in the application Client_Data\Plugins\x86_64 (They have the same name), run the app and the Holographic Remoting is failing. It keeps printing this in the Player.log:

[XR] [MROpenXR][Info   ][12:20:18.456191][tid:1a14] RemotingRuntimeOverride_TryReplaceValue hkey=2124 lpSubKey= lpValue=ActiveRuntime jsonLocation=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:18.456191][tid:1a14] RemotingRuntimeOverride_TryReplaceValue hkey=2124 lpSubKey= lpValue=ActiveRuntime jsonLocation=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:18.456205][tid:1a14] RemotingRuntimeOverride_ReplaceJsonPath originalPath= replacedPath=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:18.456205][tid:1a14] RemotingRuntimeOverride_ReplaceJsonPath originalPath= replacedPath=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [6676] [12:20:20.244][Error  ] xrEnumerateInstanceExtensionProperties: XR_ERROR_RUNTIME_UNAVAILABLE
[XR] [MROpenXR][Info   ][12:20:20.245008][tid:1a14] RemotingRuntimeOverride_TryReplaceValue hkey=9352 lpSubKey= lpValue=ActiveRuntime jsonLocation=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:20.245008][tid:1a14] RemotingRuntimeOverride_TryReplaceValue hkey=9352 lpSubKey= lpValue=ActiveRuntime jsonLocation=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:20.245027][tid:1a14] RemotingRuntimeOverride_ReplaceJsonPath originalPath= replacedPath=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:20.245027][tid:1a14] RemotingRuntimeOverride_ReplaceJsonPath originalPath= replacedPath=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [6676] [12:20:21.825][Error  ] xrEnumerateInstanceExtensionProperties: XR_ERROR_RUNTIME_UNAVAILABLE
[XR] [MROpenXR][Info   ][12:20:21.825703][tid:1a14] RemotingRuntimeOverride_TryReplaceValue hkey=9572 lpSubKey= lpValue=ActiveRuntime jsonLocation=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:21.825703][tid:1a14] RemotingRuntimeOverride_TryReplaceValue hkey=9572 lpSubKey= lpValue=ActiveRuntime jsonLocation=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:21.825724][tid:1a14] RemotingRuntimeOverride_ReplaceJsonPath originalPath= replacedPath=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:20:21.825724][tid:1a14] RemotingRuntimeOverride_ReplaceJsonPath originalPath= replacedPath=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [6676] [12:20:23.966][Error  ] xrEnumerateInstanceExtensionProperties: XR_ERROR_RUNTIME_UNAVAILABLE

Downloading the 2.9.3 dll's in the same process (weights 10025Kb) works perfectly fine, just with the bug. I can see the same 4 lines once in the 2.9.3 log that were repeated in the previous log. After that, it just connects and does not print the XR_ERROR_RUNTIME_UNAVAILABLE

[XR] [MROpenXR][Info   ][12:24:25.795350][tid:4768] RemotingRuntimeOverride_TryReplaceValue hkey=2128 lpSubKey= lpValue=ActiveRuntime jsonLocation=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:24:25.795350][tid:4768] RemotingRuntimeOverride_TryReplaceValue hkey=2128 lpSubKey= lpValue=ActiveRuntime jsonLocation=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:24:25.795365][tid:4768] RemotingRuntimeOverride_ReplaceJsonPath originalPath= replacedPath=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[XR] [MROpenXR][Info   ][12:24:25.795365][tid:4768] RemotingRuntimeOverride_ReplaceJsonPath originalPath= replacedPath=C:\________\bin\Client\Client_Data\Plugins\x86_64\RemotingXR.json
[Subsystems] Loading plugin UnityOpenXR for subsystem OpenXR Display...
[XR] [18280] [12:24:26.000][Info   ] Available Layers: (0)

This have been tested in 3 different PC's. What could be the issue here? The weights differences in the main dll caught my eye.

I can send the full Player logs by request.

lappelsmeier commented 1 month ago

Hi @galdalali ,

just a hunch - can you please install the latest x64 VC redist (https://aka.ms/vs/17/release/vc_redist.x64.exe) on your machine? For the release we had to update our compiler toolchain and we saw with other projects that not all people already have this redist which can result in DLL load issues etc.

galdalali commented 1 month ago

Yep, that did the trick. Thank you!

AMollis commented 4 weeks ago

@almoga296 this issue has been fixed in the remoting runtime v2.9.4, which is included in OpenXR MR Plugin v1.11.1

More information here:

https://github.com/microsoft/OpenXR-Unity-MixedReality-Samples/releases/tag/v1.11.1