microsoft / psi

Platform for Situated Intelligence
https://github.com/microsoft/psi/wiki
Other
529 stars 92 forks source link

UWP MediaCapture: convert NV-12 images to RGBA #273

Closed austinbhale closed 1 year ago

austinbhale commented 1 year ago

Hi, I noticed that the ImageFromNV12StreamDecoder takes a significant amount of time for each NV12-encoded image. In terms of JPEG compression, it would be nice to skip this decoding prerequisite as mentioned here: https://github.com/microsoft/psi/issues/223#issuecomment-1111201702. The conversion to RGBA format with a software bitmap in C# is not the fastest solution on the HoloLens 2, but in my performance tests, it only took about 10 ms per image versus up to 100 ms of decoding time with the ImageFromNV12StreamDecoder. This might be a good option to have for devs, even though a native decoding of the NV-12 format would be even faster. Let me know what you think!

Abdul-Mukit commented 1 year ago

@austinbhale Hi. What was the problem with this approach?

austinbhale commented 1 year ago

No problem with this conversion from NV12 to RGBA format -- I still have to finish the other PR :) (sorry for the delay).

The issue I mentioned is that the CPU load is far too great on the HL2 if you use JPEG compression for every frame, no matter the library. If the CPU is too taxing, the app will deliver an inconsistent frame rate and throttle in order to catch up on resources.

One solution would be the HL2 video encoder (H.264) incorporated into \psi (like WebRTC; https://github.com/microsoft/MixedReality-WebRTC/issues/27). This way you don't have to sacrifice as much image quality & saved frame loss by using the native hardware encoders. Then, there's the potential to hardware decode the image & keep everything on the GPU for retrieval (for CV algorithms) and rendering.