Closed austinbhale closed 2 years ago
Thanks for this PR! The code looks good, but I'm wondering if it would be possible to actually separate this out into its own component? Perhaps called something like MediaCaptureMicrophone
. It can then be more easily reusable in other places.
Or does it require access to shared resources (and coordinated initialization) with the PhotoVideoCamera?
Great idea! Initially, I was worried about having multiple MediaCapture instances for the same device, but as long as the target streams don't interfere (one for audio, one for video), then it isn't an issue.
Let me know if you can think of any more changes to add to this.
Thanks for the review! I implemented both of your ideas in https://github.com/microsoft/psi/pull/254/commits/2b4cc28b9a0f8b0474c97aae00d019b7252a4c65.
In addition to these changes, there might be instances where there are two audio channels available and developers want to specify which channel they'd like to downsample to a single-channel format. For this use case, I specify an AudioChannelNumber
to grab data for the requested channel number. If the requested channel number is out of range, an exception is thrown.
This could be useful for MediaCapture devices that have a greater distance between microphones. For example, the HL2 has its first audio channel as the top-left mic & channel two as the top-center mic.
Thanks for your changes! It occurred to me on looking at the latest that we don't actually need the MicrophoneConfiguration
property in MediaCaptureMicrophoneConfiguration
. SamplingInterval
is never used, and we might not need AudioFormat
either as we are currently only using it to determine whether or not to downsample to a single channel.
To make this component not assume a fixed audio format of 48 kHz 2 channel 32-bit floating-point PCM audio, we should really be constructing the AudioFormat
that gets posted with the AudioBuffer
based on the AudioEncodingProperties
of the received audio frame. You are already doing this to compute the sampling rate and number of channels for the captured audio. To get the FormatTag
for the WaveFormat
, you could infer this from the AudioEncodingProperties.Subtype
property, or it's probably fine for now to just assume IEEE 32-bit float PCM since that is always going to be the case for HoloLens (maybe throw an exception if you see something else that you don't currently handle?) Also, if you're downsampling to 1 channel, then obviously you'd want to set WaveFormat.Channels
to 1
, otherwise it will be equal to numAudioChannels
.
Hopefully this all makes sense?
I agree - the wave format should be generated based on the audio source's properties with MediaCapture. I believe I addressed your points in the latest commit.
For inferring the subtype, I created a dictionary that maps the subtype strings to their respective WaveFormatTag
. For the tags that weren't one-to-one translatable (from https://github.com/microsoft/psi/blob/master/Sources/Audio/Microsoft.Psi.Audio/WaveFormatTag.cs), I set them to WAVE_FORMAT_UNKNOWN
. Maybe someone more familiar with audio codecs could address the rest of the categorizations in the future?
StereoKit's microphone is unreliable at times since it crashes unexpectedly using an external audio library called MiniAudio. For more information on this crash, please see https://github.com/StereoKit/StereoKit/issues/449.
As an alternative, we can turn to Microsoft's
MediaCapture
implementation to gain access to the microphone's audio buffer. This pull request adds functionality for a newPhotoVideoCameraConfiguration
calledAudioSettings
.A general use case would be:
Not included in this pull request, but food for thought: It might be nice to rename the
PhotoVideoCamera
class to something more versatile likeMediaCapture
. This way, one wouldn't be as confused if they want only an audio stream from aMediaCapture
instance.