google / ExoPlayer

This project is deprecated and stale. The latest ExoPlayer code is available in https://github.com/androidx/media
https://developer.android.com/media/media3/exoplayer
Apache License 2.0
21.7k stars 6.01k forks source link

Support seeking into downloaded fMP4 file without top level sidx box #6704

Open milos-pesic-zattoo opened 4 years ago

milos-pesic-zattoo commented 4 years ago

[REQUIRED] Use case description

The main use case is seeking support for already downloaded fMP4 files which don't have top level sidx box. According to ISOBMFF spec sidx box isn't required. An example of such a file could be found at: https://drive.google.com/drive/folders/1kFDugORTMhPWq5fZZMPXjrB7E7oZSe5x?usp=sharing ExoPlayer isn't able to seek into these kind of files as they don't have the sidx box - so the seeking map couldn't be built according to the existing logic. The structure of these fMP4 files could be seen with the sample from the link above but in short top level mp4 boxes are organised in this way:

ftyp
pdin
moov
moof
mdat
moof
mdat
... // repeating moof and mdat for each fragment
free
mfra

Background and motivation Use case for these files could be a server which stores media for each fragment individually and needs to provide downloading feature for a media file (e.g movie) consisting of a thousands of these fragments. Providing top level sidx box in such a case is challenging from the backend perspective as the content of all fragments needs to analysed prior packaging the media and starting with serving download request.

The scope of the feature request is seeking support into already downloaded fMP4 files as supporting streaming from network for these cases might be challenging.

Testing done on other players/platforms: AVPlayer on iOS, QuickPlayer on OSX, Windows Media Player on windows and VLC player support this use case already.

Proposed solution

Possible options we thought about for solving the problem are:

Alternatives considered

NA

ojw28 commented 4 years ago

Use case for these files could be a server which stores media for each fragment individually and needs to provide downloading feature for a media file (e.g movie) consisting of a thousands of these fragments. Providing top level sidx box in such a case is challenging from the backend perspective as the content of all fragments needs to analysed prior packaging the media and starting with serving download request.

Can you provide a bit more information about the use case. In particular, what's the real life scenario where it would make sense to end up in the state of having thousands of individual fragments corresponding to a movie, without also having any corresponding indexing information, on the server side?

The only case I can really think of is that you've previously live-streamed the content using HLS (but not DASH, because the segments appear to be muxed). However, it's unclear why you wouldn't have any indexing information in that case, from which you could easily generate either a sidx box or a HLS media playlist, both of which would presumably solve this problem.

milos-pesic-zattoo commented 4 years ago

Sure - your guess is good - it's about reusing the data which is already live streamed. To be streaming protocol agnostic the encoded media data might be stored in a common container which is not necessarily mp4. So each x seconds of media data is encoded and muxed to this container - in further text media fragment. Each of these fragments in case of live streaming will become individual e.g hls segment - a streaming backend needs to take original media fragment remux it into appropriate container (depending on streaming protocol clients requested) and serve it as a dash or hls segment. In case of downloading the content - the backend could reuse media fragments - remux it to mp4 and serve it to users. Building top level sidx box in a case where original data isn't packaged in mp4 could be challenging - if possible at all - since a lot of things could change during broadcasting a stream, from one media fragment to the other (e.g number of audio tracks and audio codecs used) - so a seeking/indexing map for all live streams available on a platform would need to contain a lot of metadata in addition to their size and timing info - so that sidx box could be correctly assembled upfront. It might be possible - but could be complex and still time consuming (the index table and metadata fetching and calculation needs to happen on each request for downloading). I hope this provides a bit more context.

ojw28 commented 4 years ago

since a lot of things could change during broadcasting a stream, from one media fragment to the other (e.g number of audio tracks and audio codecs used)

How do you handle this when constructing the FMP4 file to be downloaded, given you can't change the tracks (either the number of tracks or their properties), as far as I can tell from ISO 14496-12? Are you re-encoding on the fly into a fixed number of tracks with fixed properties?

Have you considered generating a DASH manifest for the download stream, which would presumably avoid the issue you're currently running into (and is presumably much more in-line with your flow for the streaming case)?

google-oss-bot commented 4 years ago

Hey @milos-pesic-zattoo. We need more information to resolve this issue but there hasn't been an update in 14 days. I'm marking the issue as stale and if there are no new updates in the next 7 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

milos-pesic-zattoo commented 4 years ago

Unfortunately, our user case does not allow downloading a dash manifest. We need to send fragmented mp4 files as chunks to clients in a “live” manner and only know the fragment mapping or info required for seeking after sending all requested fragmented mp4 files

ojw28 commented 4 years ago

We will leave this open to track the feature request, however it's unlikely to be prioritized in the near term. The problem you're facing seems to be a consequence of some architecture choices on the serving side that, to the best of my knowledge, no one else has made, which means it ends up a fairly long way down the list when ranked in terms of cost/benefit.

NicolaVerbeeck commented 2 years ago

I did some investigation on building the seekMap using the data from mfra. We can extract durations and moof box offsets from there but the seekmap also requires the size of the moof atoms which is not encoded in the tfra atoms (unless I missed those). Creating a valid seekmap would thus involve getting the size for every moof atom referenced in the tfra. This requires a fully seekable input concept which does (currently) not exist.

Correct me if I'm wrong @ojw28

ojw28 commented 2 years ago

But the seekmap also requires the size of the moof atoms

I don't think this is required. Perhaps you've concluded this having looked at the ChunkIndex class? That's an implementation of SeekMap, but I don't think there's anything forcing you to use it. Note also that it doesn't actually use the chunk sizes to implement the SeekMap interface, so it should be straightforward to create a similar implementation that omits them entirely.

NicolaVerbeeck commented 2 years ago

I see, thanks for the clarification!