microsoft / AISkillsForWindows

Contains samples for implementing Windows Skills by extending the preview base API and using exsting skill packages
https://docs.microsoft.com/en-us/windows/ai/windows-vision-skills/
MIT License
166 stars 46 forks source link

SkeletalDetector very slow on Hololens 2 #69

Closed cwule closed 2 years ago

cwule commented 4 years ago

On my local machine (Win 10, Win 18362.1082) the skeletaldetector example runs very well with ~200ms eval time on the CPU (i7-6700k) and less than 100ms on the GPU (1080Ti). When trying it out on the Hololens 2 (10.0.19041.1382), the framerate drops drastically with eval time of ~2000ms on the CPU (Armv8 64-Bit Family 8 Model 803 Revision 70C) and ~10000ms on the GPU (Qualcomm Adreno 630GPU). Any way to make this faster on the HL2?

LPBourret commented 4 years ago

Hi, Yes I can confirm that on ARM64 perf can be lower than on x64/x86 due to usually lower compute power of these SoC impacting the time it takes for ML model inference (i.e. on Surface Pro X with a Adreno 680 and SQ1 SoC you can get ~400ms per eval on CPU). That said, the numbers you show are pretty low and, while there might be optimizations on our side to extract more perf from ARM, you could try in the meantime to feed lower resolution images to the Skeleton Detector AI Skill to shave off precious compute time. The Hololens2 has a 1080p camera, but you may toggle a lower resolution on that camera stream directly or downsize the VideoFrame you retrieve prior to setting it on the SkeletonDetectorBinding and proceeding with evaluation of the SkeletonDetectorSkill. Let me know if that makes sense and if you need help testing that.

cwule commented 4 years ago

I tried your recommendation with two mp4 videos, one at 1080p resolution, and the same video downsampled to 960x540 resolution. Interestingly the eval time did not change significantly. The bind time was slightly shorter for the lowres video (~40ms vs 60ms for hires). the eval time stayed about the same (~2000ms CPU, ~9500ms GPU) for both.

cwule commented 4 years ago

@LPBourret Also, according to your tip, how would I change the cameraprofile to one of the lower res ones, eg. listes here: https://docs.microsoft.com/en-us/windows/mixed-reality/develop/platform-capabilities-and-apis/locatable-camera

LPBourret commented 4 years ago

Hi, you can check available profiles by following this example. Thank you for trying out lower resolution, I understand that 2s latency (on the CPU) can be constraining, let me check on my side what I can do to optimize on ARM. As a mitigation, you could interleave eval using more than 1 skill running in parallel to at least get more than 1 result every 2s (i.e. schedule a frame for eval using a pool of skills)

cwule commented 4 years ago

Thanks a lot for your help! I tried changing the profile and was partially successful (decreased the resolution to 1504x846) , but I don't understand the mediacapture completely yet. I chose the camera profile in CreateFromVideoDeviceInformationAsync by:

        IReadOnlyList<MediaCaptureVideoProfile> profiles = MediaCapture.FindAllVideoProfiles(result.m_mediaCaptureInitializationSettings.VideoDeviceId);

        var match = (from profile in profiles
                     from desc in profile.SupportedRecordMediaDescription
                     where desc.Width == 424 && desc.Height == 240 && Math.Round(desc.FrameRate) == 15
                     select new { profile, desc }).FirstOrDefault();

        if (match != null)
        {
            result.m_mediaCaptureInitializationSettings.VideoProfile = match.profile;
            result.m_mediaCaptureInitializationSettings.RecordMediaDescription = match.desc;
        }

While this allowed to choose the correct camera profile (containing 424x240) in the m_mediaCaptureInitializationSettings, which was used in InitializeAsyn, in InitializeMediaFrameSourceAsync the framesource was always set back to a different format. The following expression in InitializeMediaFrameSourceAsync only provided me with available resolutions 1504x846 and 1952x1100.

m_frameSource = m_mediaCapture.FrameSources.FirstOrDefault(source => filterFrameSources(source, MediaStreamType.VideoPreview)).Value ?? m_mediaCapture.FrameSources.FirstOrDefault(source => filterFrameSources(source, MediaStreamType.VideoRecord)).Value;

Is it possible that some of the possible resolutions of the profile are not compatible with the videopreview mediacapture? Also, why do I set a mediacapture profile in InitializeAsync and then another one is loaded in InitializeMediaFrameSourceAsync?

Also sorry if this is more a Windows Mediacapture question then specifically related to this repo.

LPBourret commented 4 years ago

Hi, The profile you set onto a MediaCaptureInitializationSetting provides a driver hint to narrow down the list of functionalities to enable on the MediaCapture instance that accommodates a certain scenario (i.e. set this MediaCapture to stream low res video by default). Indeed following your changes in the code for FrameReaderFrameSource.cs you need to avoid setting the frame source to the MediaType different than your 424x240 desired resolution specified in the profile.

The logic currently for the the sample helper class FrameReaderFrameSource does not apply any profile and looks at the ISkillFeatureImageDescriptor to determine which width and height are preferable to stream to match what the AI Skill wants as input format. In the case of the SkeletalDetector, a specific width and height is not required ( == -1) and it then looks for a supported format to set on the MediaFrameSource (m_frameSource) something that matches 1920x1080 at above 15fps:

            // Get preferred camera frame format described in a ISkillFeatureImageDescriptor if specified
            int preferredFrameWidth = 1920;
            if (m_desiredImageDescriptor != null && m_desiredImageDescriptor.Width != -1)
            {
                // we don't hit this since m_desiredImageDescriptor.Width == -1
            }
            int preferredFrameHeight = 1080;
            if(m_desiredImageDescriptor != null && m_desiredImageDescriptor.Height != -1)
            {
                // we don't hit this since m_desiredImageDescriptor.Height == -1
            }
            // skipping the rest of the logic
           await m_frameSource.SetFormatAsync(selectedFormat); // we are setting a format close to 1920x1080

Left untouched this above logic overrides the profile you set earlier which derive a default format to use. In your case, I would just retrieve the MediaFrameSource that correlates with your MediaCapture initialized with your profile and proceed with the default stream format set or set a different one on the MediaCapture instance directly and not on the MediaFrameSource.