pion / webrtc

Pure Go implementation of the WebRTC API
https://pion.ly
MIT License
13.07k stars 1.6k forks source link

Performance regression and memory leak in SampleBuilder@v4 #2778

Open at-wat opened 1 month ago

at-wat commented 1 month ago

Your environment.

What did you do?

Updated pion/webrtc from v4.0.0-beta.13 to later version.

What did you expect?

SampleBuilder CPU/memory usage doesn't significantly increased.

What happened?

CPU usage is doubled in the existing benchmark, and input RTP packets aren't unreferenced after the corresponding samples are popped.

CPU

Bench name v4.0.0-beta.13 CPU v4.0.0-beta.14 CPU CPU usage increase
BenchmarkSampleBuilderSequential-32 311.4 ns/op 609.6 ns/op x1.96
BenchmarkSampleBuilderLoss-32 288.7 ns/op 720.5 ns/op x2.50
BenchmarkSampleBuilderReordered-32 320.9 ns/op 621.9 ns/op x1.93
BenchmarkSampleBuilderFragmented-32 272.8 ns/op 540.5 ns/op x1.98
BenchmarkSampleBuilderFragmentedLoss-32 246.8 ns/op 668.2 ns/op x2.71
Full benchmark outputs #### v4.0.0-beta.13 ``` $ go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder cpu: AMD Ryzen 9 3950X 16-Core Processor BenchmarkSampleBuilderSequential-32 3831741 311.4 ns/op 388 B/op 5 allocs/op BenchmarkSampleBuilderLoss-32 3969078 288.7 ns/op 344 B/op 4 allocs/op BenchmarkSampleBuilderReordered-32 3697982 320.9 ns/op 388 B/op 4 allocs/op BenchmarkSampleBuilderFragmented-32 4363172 272.8 ns/op 362 B/op 4 allocs/op BenchmarkSampleBuilderFragmentedLoss-32 4807190 246.8 ns/op 317 B/op 3 allocs/op PASS ok github.com/pion/webrtc/v4/pkg/media/samplebuilder 7.556s ``` #### v4.0.0-beta.14 ``` $ go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder cpu: AMD Ryzen 9 3950X 16-Core Processor BenchmarkSampleBuilderSequential-32 1979661 609.6 ns/op 420 B/op 6 allocs/op BenchmarkSampleBuilderLoss-32 1647154 720.5 ns/op 373 B/op 5 allocs/op BenchmarkSampleBuilderReordered-32 1881794 621.9 ns/op 420 B/op 6 allocs/op BenchmarkSampleBuilderFragmented-32 2221912 540.5 ns/op 394 B/op 5 allocs/op BenchmarkSampleBuilderFragmentedLoss-32 1816394 668.2 ns/op 346 B/op 4 allocs/op PASS ok github.com/pion/webrtc/v4/pkg/media/samplebuilder 9.402s ``` #### v4.0.0-beta.19 ``` $ go test -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder cpu: AMD Ryzen 9 3950X 16-Core Processor BenchmarkSampleBuilderSequential-32 1978306 602.4 ns/op 420 B/op 6 allocs/op BenchmarkSampleBuilderLoss-32 1609996 740.5 ns/op 373 B/op 5 allocs/op BenchmarkSampleBuilderReordered-32 1897911 628.1 ns/op 420 B/op 6 allocs/op BenchmarkSampleBuilderFragmented-32 2179366 538.9 ns/op 394 B/op 5 allocs/op BenchmarkSampleBuilderFragmentedLoss-32 1744249 684.3 ns/op 346 B/op 4 allocs/op PASS ok github.com/pion/webrtc/v4/pkg/media/samplebuilder 9.426s ```

Memory

I created a test to check that the input RTP packets are unreferenced after all samples are popped.

On v4.0.0-beta.13,

=== RUN   TestSampleBuilderPacketUnreference
--- PASS: TestSampleBuilderPacketUnreference (0.05s)

all popped packets are unreferenced.

On v4.0.0-beta.14,

=== RUN   TestSampleBuilderPacketUnreference
    samplebuilder_test.go:570: 
            Error Trace:    /home/at-wat/go/src/github.com/pion/webrtc/pkg/media/samplebuilder/samplebuilder_test.go:570
            Error:          Not equal: 
                            expected: 1
                            actual  : 65536
            Test:           TestSampleBuilderPacketUnreference
--- FAIL: TestSampleBuilderPacketUnreference (0.08s)

65536 packets aren't unreferenced.

Note

CPU and memory usage of my product software using SampleBuilder significantly increased after updating v4.0.0-beta.13 to v4.0.0-beta.19. ~I'll dig into the memory usage increase which doesn't appeared in the benchmark.~ Found the problem on input data unreference and updated above sections.

Refs

at-wat commented 1 month ago

@thatsnotright @Sean-Der FYI

at-wat commented 1 month ago

Also, the new SampleBuilder doesn't unreference the input RTP packets and causes kind of memory leak. Created a PR to add a test to check that the input RTP packets are unreferenced after all samples are popped. https://github.com/pion/webrtc/pull/2781

at-wat commented 1 month ago

@thatsnotright could you take a deeper look?

at-wat commented 1 week ago

kindly ping @thatsnotright @Sean-Der