Support multiplexed media content

Spawned from #33. We currently only support single-stream media content, we don't support multiplexed content (a media file with both audio and video). Here are my initial thoughts on what will need to change to support this:

We assume now that one MediaSource object represents a single stream. We'll need to split this so source content can contain more than one stream. We'll support two cases from MediaSource: (1) two SourceBuffers for audio and video, and (2) one SourceBuffer containing both audio and video. This will affect adding source buffers since we'll need to determine which codecs are audio or video (we currently use the MIME type for that).

We currently have a bundle of the demuxer, decoder, and frames. We'll need to split this up so the demuxer is separate from the decoders and frames. Once the content is demuxed, we can store the encoded frames in separate streams.

We'll also need to separate the demuxer and the decoder. Right now the MediaProcessor contains both the demuxer and the decoder. We'll need to split that into separate demuxers and decoders so we can create two decoders from one demuxer. This will require passing the FFmpeg-specific codec data from the demuxer to the decoder. But this will help our pipeline if we want to remove the FFmpeg decoders and do our own hardware decoding.

We'll need to be careful about determining which frames come from which streams. When using adaptation, the frames can come from different FFmpeg "streams" but are actually from the same stream from our perspective. We'll need to have a way of indicating that a frame from the demuxer comes from the video or the audio stream.

We'll also need to handle SourceBuffer.remove removing from multiple streams and that the buffered ranges for the SourceBuffer is the combination from audio and video.

Everything from the decoders and below should work fine. They only operate on frames and streams, so it shouldn't matter where the source comes from.

The design for #60 included parts of what is needed for multiplexed content. Some things still need to be done. Under the new design, a Demuxer would read the multiplexed content and provide EncodedFrame objects for both streams. We would distinguish between them using the StreamInfo::is_video field (meaning we only support audio+video, not two video streams).

First, we need to update the FFmpegDemuxer to do this. Then we need to update the DemuxerThread/SourceBuffer/MediaSource to handle this. This involves verifying we don't have too many buffers, we split the frames from the demuxer into two ElementaryStream objects instead of one, and we pass those buffers to the MediaPlayer.

Below the MediaPlayer, we already support this. All that needs to be updated is our handling of the frame buffers.

shaka-project / shaka-player-embedded

Support multiplexed media content #34