gopxl / beep

A little package that brings sound to any Go application. Suitable for playback and audio-processing.
MIT License
266 stars 11 forks source link

Wav.encode on mixer creates file of about 50G in just seconds #148

Open dhairya0904 opened 7 months ago

dhairya0904 commented 7 months ago
package main

import (
    "log"
    "os"

    "github.com/faiface/beep"
    "github.com/faiface/beep/mp3"
    "github.com/faiface/beep/wav"
)

func main() {

    s1 := getStream("./Lame_Drivers_-_01_-_Frozen_Egg.mp3")
    s2 := getStream("./Miami_Slice_-_04_-_Step_Into_Me.mp3")

    defer s1.Close()
    defer s2.Close()

    mixer := &beep.Mixer{}
    mixer.Add(s1)
    mixer.Add(s2)

    outFile, err := os.Create("mixed_audio.wav")
    if err != nil {
        log.Fatal(err)
    }
    defer outFile.Close()

    err = wav.Encode(outFile, mixer, beep.Format{SampleRate: 48000, NumChannels: 2, Precision: 2})
    if err != nil {
        log.Fatal(err)
    }
}

func getStream(file string) beep.StreamSeekCloser {
    f, err := os.Open(file)
    if err != nil {
        log.Fatal(err)
    }

    streamer, _, err := mp3.Decode(f)
    if err != nil {
        log.Fatal(err)
    }

    return streamer
}

Hi I am trying to save the sound to a file that is being played using the mixer. But the WAV file created is way big. Can any anyone please help me with this?

Any help is appreciated

MarkKremer commented 7 months ago

Hey 👋

The current behaviour of Mixer is that it keeps playing silence when no tracks are mixed in. Therefor your file will be filled with infinite silence. You could use Mix() function which doesn't do that. Does that solve the problem for your use case?

dhairya0904 commented 7 months ago

@MarkKremer Thanks a lot for the help

I am trying to capture the complete audio and stream it to the RTMP link. The problem is that the package that I am using is already playing sound using a mixer so I can not change the code to play sound.

Is there any way I can achieve this without changing the code that plays sound using mixer?

MarkKremer commented 7 months ago

I think it would be very hacky.

What package are you using?

MarkKremer commented 7 months ago

I was thinking about adding a "don't play silence" option to the mixer but I don't want to add it just to work around another package...

dhairya0904 commented 7 months ago

https://github.com/ikemen-engine/Ikemen-GO/blob/develop/src/sound.go#L494

I want to stream this audio. Even if it streams silence, it is fine, but the output file is way big so not usable.

dhairya0904 commented 7 months ago

Can you please point me to the code where I can add this flag to not to stream silence? @MarkKremer

MarkKremer commented 7 months ago

I happen to have a POC here.

However, I think another problem (and the cause of why it fills up your file so quickly) is that the encoder consumes the streamer as quick as it can write it to the file. The speaker reads from the stream much more slowly.

Given the repo you linked, I'm also thinking you may actually want to continue playing silence in between sounds. That no streams are added to the mixer doesn't necessarily mean the game has stopped...

Do you want to write to the file while also playing it through the speaker still or do you need it to just play to the file or the speaker but not both?

dhairya0904 commented 7 months ago

@MarkKremer

I only need the sound to be written to file, not both.

Yes, you're right about the encoder quickly writing the streams and then streaming silence. I want the writer to write the sound to the file exactly as it was played in the game or through the speaker.

MarkKremer commented 7 months ago

A set-up that could work:

[Mixer] -> [Throttle] -> [Ctrl] -> [wav.Encode]

The throttler doesn't exist yet but would throttle the consumption to a given sample rate.

The Ctrl node can be used to stop the stream at the end of the game by setting Ctrl.Streamer to nil. Then the Mixer can keep playing silence in between sounds.

I've made a quick POC of what the set-up including the throttle node could look like:

package main

import (
    "os"
    "time"

    "github.com/gopxl/beep"
    "github.com/gopxl/beep/mp3"
    "github.com/gopxl/beep/wav"
)

func main() {
    in := "input-file.mp3"
    out := "output-file.wav"

    inFile, err := os.Open(in)
    if err != nil {
        panic(err)
    }

    source, format, err := mp3.Decode(inFile)
    if err != nil {
        panic(err)
    }

    mixer := &beep.Mixer{}
    mixer.Add(source)

    throttle := Throttle(mixer, format.SampleRate.N(time.Second), time.Second/30)

    ctrl := &beep.Ctrl{
        Streamer: throttle,
    }

    // Simulate the game stopping after 5 seconds.
    go func() {
        time.Sleep(time.Second * 5)
        ctrl.Streamer = nil
    }()

    outFile, err := os.OpenFile(out, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0755)
    if err != nil {
        panic(err)
    }

    err = wav.Encode(outFile, ctrl, format)
    if err != nil {
        panic(err)
    }
}

func Throttle(streamer beep.Streamer, sampleRate int, debounce time.Duration) beep.Streamer {
    return &Throttler{
        streamer:        streamer,
        sampleRate:      sampleRate,
        debounce:        debounce,
        samplesConsumed: 0,
        startTime:       time.Now(),
    }
}

type Throttler struct {
    streamer        beep.Streamer
    sampleRate      int
    debounce        time.Duration
    startTime       time.Time
    samplesConsumed int
}

func (t *Throttler) Stream(samples [][2]float64) (n int, ok bool) {
    for {
        millisecondsElapsed := time.Now().Sub(t.startTime) / time.Millisecond // work with milliseconds instead of seconds to avoid too large rounding errors.
        maxConsumption := int(millisecondsElapsed) * t.sampleRate / 1000
        if maxConsumption > t.samplesConsumed {
            break
        }
        time.Sleep(t.debounce)
    }

    n, ok = t.streamer.Stream(samples)
    t.samplesConsumed += n

    return
}

func (t *Throttler) Err() error {
    return nil
}

debounce is how much time the throttle node should wait when it hits the consumption limit. It's should behave similar to the buffer size on the speaker.

Like I said, this is a quick POC. Please let me know if you run into any problems. This could be a good addition to the Beep package. I may refine it and add it to the package once I have more time. I have quite the busy schedule right now. :)

Edit: after you signal that the writing must stop with the Ctrl node, make sure wav.Encode is finished before exiting the program. Otherwise it may not be done writing the WAV header.

dhairya0904 commented 7 months ago

Hi @MarkKremer Thanks a lot. It did work

Just a doubt, how changing this debounce will affect the recodring.

throttle := Throttle(mixer, format.SampleRate.N(time.Second), time.Second/30)

Let me know if I can help you with implementing this feature in beep 😃

MarkKremer commented 7 months ago

tldr: higher debounce is more latency between changes to the Mixer and the changes showing up in the stream. But might be more efficient.

The debounce is the difference between:

Small debounce:
time ->
[wait] -> [stream] -> [wait] -> [stream] -> [wait] -> [stream] -> [wait]...
                                              ↑

and

Bigger debounce:
[wait longer...] -> [stream] -> [stream] -> [wait longer...] -> [stream]...
                                              ↑

Say you want to add a streamer to the Mixer at the time of the arrow. In the second case it takes longer for the next call to Stream() to be made. Therefor it may take more time for changes to the streamers to show up in the resulting audio. This is similar to what the speaker does because it buffers the samples and when the buffer empties it calls Stream() until it is full again.

The reason I added the debounce is because I suspected it may be more efficient to process a bunch of samples at once and wait for a little more compared to processing a little amount of samples and switching between goroutines more. However, I haven't checked that assumption, so it would be good to test that in the near future. Maybe Go is more efficient when switching? Feel free to have a try at calculating the exact amount of waiting time instead of having a configurable debounce. Otherwise I'll have a try when I feel like it between the other things I'm doing. :)

A debounce of 1/30th a second it not that noticeable, but it would be nice to drop the argument so users don't have to think about it.

Like the speaker, Throttle should also have a Lock() function so it doesn't call Stream() when changes to the Mixer etc. are made. That could cause concurrency problems. I forgot that in my first snippet.