Closed cwule closed 2 years ago
Hi, Yes I can confirm that on ARM64 perf can be lower than on x64/x86 due to usually lower compute power of these SoC impacting the time it takes for ML model inference (i.e. on Surface Pro X with a Adreno 680 and SQ1 SoC you can get ~400ms per eval on CPU). That said, the numbers you show are pretty low and, while there might be optimizations on our side to extract more perf from ARM, you could try in the meantime to feed lower resolution images to the Skeleton Detector AI Skill to shave off precious compute time. The Hololens2 has a 1080p camera, but you may toggle a lower resolution on that camera stream directly or downsize the VideoFrame you retrieve prior to setting it on the SkeletonDetectorBinding and proceeding with evaluation of the SkeletonDetectorSkill. Let me know if that makes sense and if you need help testing that.
I tried your recommendation with two mp4 videos, one at 1080p resolution, and the same video downsampled to 960x540 resolution. Interestingly the eval time did not change significantly. The bind time was slightly shorter for the lowres video (~40ms vs 60ms for hires). the eval time stayed about the same (~2000ms CPU, ~9500ms GPU) for both.
@LPBourret Also, according to your tip, how would I change the cameraprofile to one of the lower res ones, eg. listes here: https://docs.microsoft.com/en-us/windows/mixed-reality/develop/platform-capabilities-and-apis/locatable-camera
Hi, you can check available profiles by following this example. Thank you for trying out lower resolution, I understand that 2s latency (on the CPU) can be constraining, let me check on my side what I can do to optimize on ARM. As a mitigation, you could interleave eval using more than 1 skill running in parallel to at least get more than 1 result every 2s (i.e. schedule a frame for eval using a pool of skills)
Thanks a lot for your help! I tried changing the profile and was partially successful (decreased the resolution to 1504x846) , but I don't understand the mediacapture completely yet. I chose the camera profile in CreateFromVideoDeviceInformationAsync
by:
IReadOnlyList<MediaCaptureVideoProfile> profiles = MediaCapture.FindAllVideoProfiles(result.m_mediaCaptureInitializationSettings.VideoDeviceId);
var match = (from profile in profiles
from desc in profile.SupportedRecordMediaDescription
where desc.Width == 424 && desc.Height == 240 && Math.Round(desc.FrameRate) == 15
select new { profile, desc }).FirstOrDefault();
if (match != null)
{
result.m_mediaCaptureInitializationSettings.VideoProfile = match.profile;
result.m_mediaCaptureInitializationSettings.RecordMediaDescription = match.desc;
}
While this allowed to choose the correct camera profile (containing 424x240) in the m_mediaCaptureInitializationSettings
, which was used in InitializeAsyn
, in InitializeMediaFrameSourceAsync
the framesource was always set back to a different format. The following expression in InitializeMediaFrameSourceAsync
only provided me with available resolutions 1504x846 and 1952x1100.
m_frameSource = m_mediaCapture.FrameSources.FirstOrDefault(source => filterFrameSources(source, MediaStreamType.VideoPreview)).Value ?? m_mediaCapture.FrameSources.FirstOrDefault(source => filterFrameSources(source, MediaStreamType.VideoRecord)).Value;
Is it possible that some of the possible resolutions of the profile are not compatible with the videopreview mediacapture? Also, why do I set a mediacapture profile in InitializeAsync
and then another one is loaded in InitializeMediaFrameSourceAsync
?
Also sorry if this is more a Windows Mediacapture question then specifically related to this repo.
Hi,
The profile you set onto a MediaCaptureInitializationSetting
provides a driver hint to narrow down the list of functionalities to enable on the MediaCapture instance that accommodates a certain scenario (i.e. set this MediaCapture
to stream low res video by default).
Indeed following your changes in the code for FrameReaderFrameSource.cs you need to avoid setting the frame source to the MediaType different than your 424x240 desired resolution specified in the profile.
The logic currently for the the sample helper class FrameReaderFrameSource
does not apply any profile and looks at the ISkillFeatureImageDescriptor
to determine which width and height are preferable to stream to match what the AI Skill wants as input format. In the case of the SkeletalDetector, a specific width and height is not required ( == -1) and it then looks for a supported format to set on the MediaFrameSource
(m_frameSource
) something that matches 1920x1080 at above 15fps:
// Get preferred camera frame format described in a ISkillFeatureImageDescriptor if specified
int preferredFrameWidth = 1920;
if (m_desiredImageDescriptor != null && m_desiredImageDescriptor.Width != -1)
{
// we don't hit this since m_desiredImageDescriptor.Width == -1
}
int preferredFrameHeight = 1080;
if(m_desiredImageDescriptor != null && m_desiredImageDescriptor.Height != -1)
{
// we don't hit this since m_desiredImageDescriptor.Height == -1
}
// skipping the rest of the logic
await m_frameSource.SetFormatAsync(selectedFormat); // we are setting a format close to 1920x1080
Left untouched this above logic overrides the profile you set earlier which derive a default format to use. In your case, I would just retrieve the MediaFrameSource
that correlates with your MediaCapture
initialized with your profile and proceed with the default stream format set or set a different one on the MediaCapture
instance directly and not on the MediaFrameSource
.
On my local machine (Win 10, Win 18362.1082) the skeletaldetector example runs very well with ~200ms eval time on the CPU (i7-6700k) and less than 100ms on the GPU (1080Ti). When trying it out on the Hololens 2 (10.0.19041.1382), the framerate drops drastically with eval time of ~2000ms on the CPU (Armv8 64-Bit Family 8 Model 803 Revision 70C) and ~10000ms on the GPU (Qualcomm Adreno 630GPU). Any way to make this faster on the HL2?