Multi-channel support - Githubissues

kinetiknz commented 7 years ago

This issue tracks the big picture of multi-channel support. I'll update this later with a better summary.

The main items are:

API design
implementation for each backend
- WASAPI
- AudioUnit
- PulseAudio
- OpenSL
- others (minimum changes to retain existing functionality, actual support will need to be contributed)
deal with code duplicating between cubeb and Gecko's AudioConverter
work out how this interacts with device switching (discussed in issue #167)

Right now, there's some discussion in pull #171 covering this that needs to be moved into this issue.

ChunMinChang commented 7 years ago

As an independent cross-platform audio library, it's better to handle the downmix/upmix inside cubeb, rather than Gecko's AudioConverter, to tackle the incompatibility of the channel number. However, it has to figure out what capability of downmix/upmix needed in cubeb. Does it need a general purpose conversion from m channels to n channels, where m≠n, or just some specific conversion like k channels to stereo, where 2 < k < 9, and stereo to 5.1 ?

A combination of some specific conversion may be more feasible. It seems that the current downmix code in AudioConverter already can convert from k channels into stereo. For audio 5.1, Table 2 in ITU-R BS.775-3[0] provides a downmix coefficients matrix to convert from 3/2 to 1/0, 2/0, 3/0, 2/1, 3/1, 2/2, where x/y stands x front channels with y rear or surround channels[1]. However, it might need more work for conversion of stereo to 5.1 audio[2].

References: [0] ITU-R BS.775-3 [1] Dolby_Digital [2] Real-Time Conversion of Stereo Audio to 5.1 Channel Audio for Providing Realistic Sounds

ChunMinChang commented 7 years ago

Beyond the implementation for each backend, I think the general design should follow the below comments.

Multiple Channel Configuration To support multiple channels, we need to know the following information.
- Channel Layout The different channel layouts may have same channel count. For example, 2.2 and 4.0 both are use 4 channel, but 2.2 use front left, front right, side left, side right and 4.0 use front left, front right, front center, back center. Thus, it needs a way to distinguish which speakers should be used.
- Channel order Some platform requires channel order to setup the speakers, e.g., kAudioOutputUnitProperty_ChannelMap for AudioUnit on OS X, so we should use a vector to define the channel orders for different channel layouts.
The simplest solution is to use SMPTE channel layout(order) as the input format. By this, the order is fixed, so we can only handle the layout part.
Downmix and Upmix Convert the audio data from m channels input to n channels output, where m≠n. The stereo, 5.1, 7.1 are most common use cases, so we can only focus on m = 2, 6, 8 or n = 2, 6, 8 in current phase. MythTV is a useful reference.

ChunMinChang commented 7 years ago

This feature could be divided into several phases:

phase 1
- Implement basic support for multiple channel on each platform.
- Implement downmix for multiple channels. (There is a spec for audio 5.1 downmix, while I can not find the standard for upmix. Thus, the design of upmix could stay same in this phase.)
phase 2
- Integrate with audio device switching.
phase 3
- Add some fancy features, e.g., stereo-to-5.1 simulation.

ChunMinChang commented 7 years ago

Downmix/Upmix module The downmix/upmix code should be separated into a independent module from WASAPI for following reasons:

It can be reused for all backends. The downmix/upmix is refered to some standards, so it should be same for each backend.
It's easier for testing. We can ignore the internal process of backends and just focus on the correctness of supported layout conversion.
It's flexible to extend. If we want to add some fancy features like stereo-to-5.1 simulation, then only this module needs to be modified, and it will be applied to all backends.

padenot commented 7 years ago

Agreed, we should separate it.

ChunMinChang commented 7 years ago

I think we could design three mechanisms for mixing:

Specific conversion
mapping channel data
bypassing data by channel index

Each time when we try to upmix or downmix, we need to try converting data with the above order. That is, we can try specific conversion first. If it wroks, then the job is done. Otherwise, we next try mixing by mapping the channel. If it still doesn't work, then we try mixing by bypassing the channel data. The final mechanism should be our fallback plan and it should always work.

Specific conversion Some conversion has its own definition, so we need to implement this. For example, Table 2 in ITU-R BS.775-3 define the downmix equations from 3F2 to 1F, 2F, 3F, 2F1, 3F1 and 2F2.

Mixing by mapping channel data In most cases, the input and output data can be mapped by its layout setting. For example, if we try downmixing from 3F(L, R, C) to Stereo(L, R), we only need to pass the first two input channel data to output.

Mixing by bypassing channel data There is some case the above mechanisms don't cover. The downmix from stereo(L, R) to mono(M) is an example. There is no spec and there is no matched channel for this conversion. Especially, WASAPI can support some unmatched speaker settings like 6 channels with stereo layout(stereo should only has 2 channels). In such case, we don't know the mixing policy should follow the layout or channel number.

The simplest plan is to follow its channel numbers. If the input has 2 channels and output has 1 channel, then we just need to pass the first data to the output. We just need to pass the channel data by channel index.

An alternative way is to define some matrices to compress/expand the audio data. However, the combination is not a small number.

ChunMinChang commented 7 years ago

For testing, I am wondering if it's feasible to fake an audio device and programmatically register it as default audio output. Then we can intercept and verify the output through the faked device. Maybe this issue should be discussed in #193.

mozilla / cubeb

Multi-channel support #178