livepeer / lpms

Livepeer media server
MIT License
282 stars 71 forks source link

Audio clicking on segmented transcodes #152

Open j0sh opened 5 years ago

j0sh commented 5 years ago

When transcoding audio with consecutive segments, harmonic content such as sine waves produces audible clicking when playing back transcoded results. This clicking shows up as a transient in spectrograms. Presumably this is because most audio encoders operate with a sliding window, which leads to discontinuities at the edges of each group of samples.

Mitigation

One way to mitigate this effect is to pad the source segment with audio samples from adjacent segments. After transcoding, the padded samples can be dropped. However, it is possible that the efficiency of this depends on a few factors:

This may have other implications in a live transcoding context, such as the need to wait for trailing padding to become available before submitting the source segment for transcoding.

Example of padding the end of a segment:

function pad {
  frames=1
  fname=$(basename $1)
  ffmpeg -loglevel warning -hide_banner -y -i $1 -muxdelay 0 -vn -c:a copy -copyts out/audio.ts
  ffmpeg -loglevel warning -hide_banner -y -i $2 -muxdelay 0 -vn -c:a copy -frames:a $frames -copyts out/end.ts
  cat out/audio.ts out/end.ts > out/padded_$fname
}

pad in/source_504.ts in/source_505.ts
pad in/source_505.ts in/source_506.ts
pad in/source_506.ts in/source_507.ts
pad in/source_507.ts in/source_508.ts

Example of trimming the padding:

function trim {
    fname=$(basename $1)
    frames=$(ffprobe -loglevel warning -hide_banner -count_packets \
      -select_streams a -show_streams $1 | grep nb_read_packets | grep -o '[0-9]*$')
    frames=$(($frames-1))
    ffmpeg -hide_banner -y -i $1 -muxdelay 0 -vn -c:a copy \
        -frames:a $frames -copyts out/stripped_$fname
}

trim out/padded_source_504.ts
trim out/padded_source_505.ts
trim out/padded_source_506.ts
trim out/padded_source_507.ts

Samples

Source: ffmpeg -f lavfi -i sine -c:a aac -f hls -hls_time 2 -t 10 test.m3u8

Source audio, concatenated:

$ cat test0.ts test1.ts test2.ts test3.ts test4.ts test5.ts > source.ts

image

Sample program to transcode audio:

// test.go
package main

import (
    "fmt"
    "os"

    "github.com/livepeer/lpms/ffmpeg"
)

func main() {
    inp := os.Args[1]
    pfx := os.Args[2]
    ffmpeg.InitFFmpeg()
    for i := 0; i <= 5; i++ {
        in := &ffmpeg.TranscodeOptionsIn{
            Fname: fmt.Sprintf("%s%d.ts", inp, i),
        }
        out := []ffmpeg.TranscodeOptions{ffmpeg.TranscodeOptions{
            Oname:        fmt.Sprintf("%s%d.ts", pfx, i),
            VideoEncoder: ffmpeg.ComponentOpts{Name: "drop"},
        }}
        _, err := ffmpeg.Transcode3(in, out)
        if err != nil {
            fmt.Println("Could not transcode ", err)
        }
    }
}
$ go run test.go foo transcoded
$ cat transcoded0.ts transcoded1.ts transcoded2.ts transcoded3.ts transcoded4.ts transcoded5.ts > transcoded.ts

Transcoded audio, concatenated: image

criticaltv commented 4 years ago

I have been able to verify this by ear - there is a noticeable "blip" sound every 2 seconds.

Listen to this test stream with good speakers / headphones.

ffplay http://52.29.226.43:8935/stream/hello_world/P144p30fps16x9.m3u8

Does anyone else hear what I hear?