Support variable number of channels

Current state

I would say that the 2 most important types in Beep are the following:

// Streamer is able to stream a finite or infinite sequence of audio samples.
type Streamer interface {
    Stream(samples [][2]float64) (n int, ok bool)
    Err() error
}

// Format is the format of a Buffer or another audio source.
type Format struct {
    SampleRate SampleRate
    NumChannels int
    Precision int
}

Streamer allows us to define operations on samples. Using the composite pattern it is possible to combine operations to create more complex operations.

Format, besides storing the format information, is used to encode/decode samples into different representations.

These types are very powerful and can be used to do a lot of things with very little. However, there are some details about them that make me wonder if something better is possible:

Even though the Format specifies the number of channels, within the interface of the Streamer, the number of channels is hardcoded to 2. I suspect this is done to keep the library simple and this is an important consideration. Please try to keep this in mind when reading the rest of this proposal.
Withing Beep, Precision and SampleRate are mostly used at endpoints: when encoding/decoding a file and when using an in-memory buffer (which is similar to a WAV file). In addition, SampleRate can be used when resampling samples.

The number of channels seems like it's an inherent property of the samples while the Format is only used at specific parts of the application. It is metadata that is exposed when decoding a file format, or it can passed as configuration to encode audio. Format is however, never directly used by Streamers and is completely separate from the composite pattern that is core to Beep.

Proposal

Move NumChannels to Samples.

The samples are stored in an interleaved format in a 1D slice. We lose the syntactic sugar of 2D slices which I solved by using methods (BOOO!). I think the benefits could very well outweigh the drawbacks but I would like to invite you to think about the developer experience for the users of Beep when, say, they want to implement a custom Streamer.

For reference, this is what the types will look like (approximately):

// Samples contains a finite sequence of audio samples for one or more channels.
type Samples struct {
    Samples []float64 // interleaved
    NumChannels int
}

// Get a single sample.
func (s Samples) Get(index, channel int) float64 {
    return s.Samples[index*s.NumChannels + channel]
}

// Set the value of a sample.
func (s *Samples) Set(index, channel int, value float64) {
    s.Samples[index*s.NumChannels + channel] = value
}

// Streamer is able to stream a finite or infinite sequence of audio samples.
type Streamer interface {
    Stream(samples Samples) (n int, ok bool)
    Err() error
}

// Format describes the stored format of an audio stream, as a file or in-memory.
type Format struct {
    SampleRate SampleRate
    Precision int
}

In this scenario, Format can be used to format individual samples still. However, it doesn't deal with framing samples of channels together.

What do we gain?

One obvious benefit is that the number of channels isn't constant anymore:

It is possible to read Vorbis 5.1 surround sound files without Beep choosing for you which channels to keep. I don't know what they're doing with 5.1 audio files in Beep, but that sounds fun.
One of the use cases that has been on my mind lately is Beep within games. In games, a lot of source audio only requires a single channel. For example, enemy attack/grunt/movement sounds can be stored using a single channel. It is only until the sound is placed in the world that it gains a position. Then using the position information and the Doppler effect the audio is converted to 2 or more different channels for the speakers to play and your brain to interpret.

Furthermore: operations on channels.

Operations on channels

Because the channel count is stored in the Samples struct, Streamer operations that act on those channels become a possibility. This gives the user better control of what they want to do:

streamer, format, err := vorbis.Decode(myFileReader)
if err != nil {
    panic(err)
}

channels := SplitChannels(streamer)
desiredChannels := MergeChannels(channels[0], channels[2]) // keep only the front left and front right channel

err = speaker.Init(format.SampleRate, format.SampleRate.N(time.Second))
if err != nil {
    panic(err)
}
speaker.Play(disiredChannels)

I suspect the implementation of SplitChannels() and MergeChannels() will be a bit more complex than it may look at first. But I think it is doable.

Cons

Like I said, the Streamer becomes slightly more complex in some way.
Implementations of Streamer must support different values for NumChannels or return an error if the channel count is unsupported.
The speaker/Oto doesn't support more than 2 channels currently. It will be required to manually transform whatever Streamer you have to the required number of channels. However, the tools to do so will be available to you (see previous code snippet).
These changes are not backwards compatible.

gopxl / beep