gopxl / beep

A little package that brings sound to any Go application. Suitable for playback and audio-processing.
MIT License
244 stars 9 forks source link

Support variable number of channels #9

Open MarkKremer opened 11 months ago

MarkKremer commented 11 months ago

Current state

I would say that the 2 most important types in Beep are the following:

// Streamer is able to stream a finite or infinite sequence of audio samples.
type Streamer interface {
    Stream(samples [][2]float64) (n int, ok bool)
    Err() error
}
// Format is the format of a Buffer or another audio source.
type Format struct {
    SampleRate SampleRate
    NumChannels int
    Precision int
}

Streamer allows us to define operations on samples. Using the composite pattern it is possible to combine operations to create more complex operations.

Format, besides storing the format information, is used to encode/decode samples into different representations.

These types are very powerful and can be used to do a lot of things with very little. However, there are some details about them that make me wonder if something better is possible:

The number of channels seems like it's an inherent property of the samples while the Format is only used at specific parts of the application. It is metadata that is exposed when decoding a file format, or it can passed as configuration to encode audio. Format is however, never directly used by Streamers and is completely separate from the composite pattern that is core to Beep.

Proposal

Move NumChannels to Samples.

The samples are stored in an interleaved format in a 1D slice. We lose the syntactic sugar of 2D slices which I solved by using methods (BOOO!). I think the benefits could very well outweigh the drawbacks but I would like to invite you to think about the developer experience for the users of Beep when, say, they want to implement a custom Streamer.

For reference, this is what the types will look like (approximately):

// Samples contains a finite sequence of audio samples for one or more channels.
type Samples struct {
    Samples []float64 // interleaved
    NumChannels int
}

// Get a single sample.
func (s Samples) Get(index, channel int) float64 {
    return s.Samples[index*s.NumChannels + channel]
}

// Set the value of a sample.
func (s *Samples) Set(index, channel int, value float64) {
    s.Samples[index*s.NumChannels + channel] = value
}

// Streamer is able to stream a finite or infinite sequence of audio samples.
type Streamer interface {
    Stream(samples Samples) (n int, ok bool)
    Err() error
}
// Format describes the stored format of an audio stream, as a file or in-memory.
type Format struct {
    SampleRate SampleRate
    Precision int
}

In this scenario, Format can be used to format individual samples still. However, it doesn't deal with framing samples of channels together.

What do we gain?

One obvious benefit is that the number of channels isn't constant anymore:

Furthermore: operations on channels.

Operations on channels

Because the channel count is stored in the Samples struct, Streamer operations that act on those channels become a possibility. This gives the user better control of what they want to do:

streamer, format, err := vorbis.Decode(myFileReader)
if err != nil {
    panic(err)
}

channels := SplitChannels(streamer)
desiredChannels := MergeChannels(channels[0], channels[2]) // keep only the front left and front right channel

err = speaker.Init(format.SampleRate, format.SampleRate.N(time.Second))
if err != nil {
    panic(err)
}
speaker.Play(disiredChannels)

I suspect the implementation of SplitChannels() and MergeChannels() will be a bit more complex than it may look at first. But I think it is doable.

Cons

dusk125 commented 11 months ago

Speaking to the speaker/Oto 2 channel problem: in addition to split and merge, there could be a, let's call it, MapChannels where you could specify how the n channels gets merged into 1 or 2 channels.

Something like

MapChannels(leftRightMapper{Left: []channel{channels[0], channels[2]}, Right: []channel{channels[1], channels[3]}})

Something like this could be a stereo to mono mapper

// Maps mono audio to stereo output
MapChannels(leftRightMapper{Left: []channel{channels[0]}, Right: []channel{channels[0]}})

For the proposal as a whole, I think it makes sense to have the channel information near the samples (and thus allow samples to have n channels). I've had a project where having the methods would've alone made it much easier to think about (I was mapping audio sent across the network to beep).

I wonder then if it would be worth having those that only support 1 or two channels, to have a special case streamer such that it's not possible to feed a 6 channel streamer into an speaker (for example). I feel that that could breed confusion and annoy to find bugs.