Closed ericxtang closed 6 years ago
Goal Perform media processing using the native FFmpeg API within LPMS.
Why Allows shipping LPMS without distributing or requiring additional runtime dependencies (whether executables or shared libraries), taking advantage of Go's ability to produce self-contained executables.
Many exist, in various states of upkeep, and with various levels of sugar wraping the APIs. The nice thing is they generally allow us to write more idiomatic Go while hiding the gritty details of the libav API. The drawback is that the gritty details of the libav API may be hidden, when we most need it. From a cursory inspection of several projects, the API sugar does not seem like an issue for the current goals of LPMS, although some projects are missing a few features that would have to be ported in.
Here are some of the better bindings, and a very quick, incomplete perusal of what would be required to make these production grade.
These would be bindings we write ourselves.
The biggest concern with using third party libav-cgo bindings are heavyweight operations disrupting goroutine scheduling. However, we should still be less prone to issues such as [3][4] since the number of encodes can be precisely bounded.
Generally, the fewer times we have to cross the go-cgo boundary, the better. In practice, for a custom cgo implementation, this would mean an entirely self-contained API similar to the following:
int segmentVideo(char inputStream, char outputPrefix, ...segmentParams) { ... read input and write output until EOF ... }
int transcodeVideo(char inputStream, char outputPrefix, ...codecParams) { ... read input and write output until EOF ... }
Note these APIs would block until the input EOFs. This leads to straightforward semantics for goroutine interaction: each API call would get its own thread without the risk of unexpected thread growth.
Cgo within a goroutine gets its own thread. Does each concurrent cgo thread count against GOMAXPROCS? Could we have trouble squeezing out more concurrent encodes even if a cgo thread is blocked waiting for input from within ffmpeg? If so, we may need to manage media processing outside the Go runtime entirely, using pthreads. Wild idea: do this within Rust, which will statically verify thread safety and expose a C-compatible FFI.
There are various ways to handle the output segments, but those can be handled entirely within Go. FFmpeg can generate the HLS manifests for us. We can maintain the current method of polling the file system for changes, use file system notifications [1], or feed back information on new segments via message passing.
FFmpeg itself is LGPL, but some of the better codecs are GPL'd or nonfree (eg, x264 or FDK AAC). Will linking and distributing these codecs have ramificiations even though LPMS is MIT licensed? In any case, it is good to be aware. Running a separate ffmpeg executable sidesteps this problem somewhat, although I'm not sure if the current usage qualifies as "intimate communication" [2]
Hands-on evaluation of one or two of the go bindings. Check for these:
Failing that, we'll write custom bindings.
[1] https://github.com/fsnotify/fsnotify
[2] https://www.gnu.org/licenses/gpl-faq.en.html#GPLPlugins
[3] https://www.cockroachlabs.com/blog/the-cost-and-complexity-of-cgo/
[4] https://groups.google.com/forum/#!topic/golang-nuts/8gszDBRZh_4
Interesting comment about potentially using Rust to wrap the concurrent encoding threads. This is outside the scope of this task, but one other consideration related to this is the verifiability of the encoding in the Truebit Virtual Machine.
Truebit uses a flavor of web assembly within the VM, so "tasks" written in language that can target WASM can be verified, including C and Rust. But not Go. I figured that when we got to the point of integration we would essentially have to treat the ffmpeg portion of the encoding job as the verifiable piece, and compile the same version of ffmpeg into wasm. And we may run into challenges with boundaries if we natively embed within go code. But the idea of keeping a clean boundary, or wrapping the interface to the task in something like Rust is interesting, because then potentially the same code can be used both for verification and in the node without jumping through hoops.
This is a longer discussion on verification of course, and there are other techniques we can use.
I agree with the next steps. I think in evaluating any of the existing library choices we'd likely have to be comfortable with the fact going in that we'll likely be forking/maintaining the library to keep it up to date for our purposes without necessarily needing to support the full range of the ffmpeg ecosystem. Our use case is pretty limited at the moment.
Related to the threading question, we currently execute the transcoding command with -threads 1
because ffmpeg output becomes indeterministic if we allow multiple threads. This is a current limitation due to our verification method.
It's great you brought up the relationship between C threads and GOMAXPROCS. But either way, in my mind it's not a huge issue. People have suggested using global locks to manage the relationships between goroutines and Cgo threads (for example, we could have a lock per-stream to make sure we will have at most the 1 Cgo thread per stream - this would be fine for the live stream case).
About handling the outputs - we are actually currently ignoring the HLS manifest during segmentation. We basically only move the ts segments around and re-construct the manifest at the edge media server. Is there a good reason to keep the original manifest?
Great job with your analysis and looking into a new language. The next steps sound great. If you can keep the code from your experiments, I'd love to be able to try them out on my local machine.
Thanks for the feedback Doug and Eric.
we'd likely have to be comfortable with the fact going in that we'll likely be forking/maintaining the library to keep it up to date for our purposes without necessarily needing to support the full range of the ffmpeg ecosystem.
Maintaining our own bindings is looking increasingly likely, although maybe not any of these libs.
It's great you brought up the relationship between C threads and GOMAXPROCS.
The concern with this is actually whether we might need more CGo threads than we have cores available (or otherwise >GOMAXPROCS). For example, one thread for reading input, and another for encoding (some encoding profiles are likely to be faster than realtime. Or maybe with a shared decoding/demuxing context, we don't want slower profiles to block progress on faster ones.) Maybe in that case, we could schedule them manually. Or it's a non-issue entirely. I'm not sure yet.
Is there a good reason to keep the original manifest?
Not that I can think of right now.
https://gist.github.com/j0sh/ffe816e4bca5dd8be92803c597efd8bd#file-readme-md
This does not (yet) correctly copy all frames; additional work on AVPacket is needed. See below.
https://github.com/targodan/ffgopeg/compare/develop...j0sh:livepeer?expand=1
The API is somewhat Go-friendly but still a rather literal mapping to FFmpeg, including the need to do cleanup manually for AVFrame, AVPacket and the various contexts. Essentially, knowledge of the FFmpeg API is required in addition to learning the Go API itself. The benefit of such a literal API mapping seems slight, aside from aviding writing C directly. Even then, there would not be much insulation from C-related errors. In fact, the bindings themselves introduce an additional error surface; see the memory leak that was fixed.
Some flaws with go packaging, or perhaps more with how ffgopeg is using it.
The structure of packages such as ffgopeg (with multiple interdependent local packages) makes it more difficult to maintain upstream compatibility while keeping in-progress development repositories in another remote location. Notably, dependent package paths are hard-coded as in in [1], which, in this case, percolates down to a typechecking error. This makes the release repository at gpkg.in the bottleneck for distributing experimental changes using Go's built-in packaging mechanisms. While we can mitigate this at compile-time by manually fixing up the repository in $GOPATH [3], it will become cumbersome as the project scales.
Given the pace of engagement with this package, and the packaging issues, we'd probably be better maintaining our own (incompatible) fork, as has been suggested.
This is one of the better Go bindings. However, the FFmpeg API is both expansive and a moving target. It would take several weeks to get a production grade integration working and throughly tested, in addition to ongoing maintenance effort. Even then, it is unclear what benefit we get from the bindings themselves as opposed to a more focused Go API that exposes precisely the features we need; for example RtmpToHLS(...)
implemented in C.
Note that RtmpToHLS
is a function that we'd have to write anyway; whether it's implemented directly in C, CGo or a shell call out to FFmpeg (as is currently done). Hence, implementing the function directly in C would have a quicker turnaround, just because it avoids the intermediate step of fixing the CGo bindings.
[1] https://github.com/j0sh/ffgopeg/blob/livepeer/avcodec/avcodec.go#L21 [2] https://gist.github.com/j0sh/ffe816e4bca5dd8be92803c597efd8bd#file-transmux-go-L79 [3] https://gist.github.com/j0sh/ffe816e4bca5dd8be92803c597efd8bd#file-readme-md
Right now we invoke ffmpeg through a command. This is bad practice. We should use a native go binding for ffmpeg.
Examples:
Does anyone know which package is better?