Apply best seeking strategy for MP3

ryanheise commented 3 years ago

ExoPlayer provides options to enable index-based seeking and constant bitrate seeking, but in order to choose an appropriate strategy for some arbitrary media file (e.g. an arbitrary URL not under the control of the app developer), it would be necessary to first know whether the file has a precise or approximate seek table, or in the absence of any seek table, whether it is encoded with a VBR. How is an app developer expected to best query these things in order to choose an appropriate seek strategy for a given file? Is there any way ExoPlayer could make this easier?

yanivshaked commented 2 years ago

@christosts Any update on this issue? Timeline?

christosts commented 2 years ago

There's no mechanism in the player at the moment to inform the app what seeking strategy to use, or an API to inform apps of the seeking capabilities of a file.

You could proactively enable both options (index-based and constant bit-rate seeking) in the DefaultExtractorsFactory. However, for MP3 files, if both options are enabled, then index-based seeking will be applied, with the caveats described here.

We can try to improve MP3 parsing when both seeking options are enabled to pick the "best" one depending on whether the file is CBR/VBR, but at the moment I'm not sure we can reliably detect that early without actually parsing the entire file.

I will mark this issue as an enhancement, but it's probably going to be on low-priority.

ryanheise commented 2 years ago

Thanks, @christosts . I'm glad there is an idea for a long term solution that's perhaps more ideal, although it would be good to also have a short term, cheap workaround.

Looking on StackOverflow, there is an answer suggesting how an app could monitor the bitrate:

https://stackoverflow.com/a/32135909/13949389

But to then make use of this information, we would need to be able to lazily/dynamically set either index-based or constant bit-rate seeking after the audio has already started loading. Would that be feasible in the short term, or is it already possible?

ojw28 commented 2 years ago

If there's no better way to determine whether an MP3 file is CBR or VBR, other than to scan through every frame, then I think that proactively enabling both options as @christosts suggested is the best that can be done here.

The only other suggestion I can think of is that perhaps index seeking could heuristically detect "probably constant bitrate" after it's scanned some portion of the file and established that the scanned part is constant bitrate. That would not guarantee correct seeking though, since it's basically guessing that the rest of the file will be the same.

ryanheise commented 2 years ago

However, for MP3 files, if both options are enabled, then index-based seeking will be applied

I want to make sure I understand this: does that mean enabling both for an MP3 file is equivalent to just enabling index-based seeking alone because in either case, index-based seeking will inevitably be applied regardless of the other option being enabled?

Or is it instead the case that if you seek beyond the tip of where the index-based seektable has so far been generated, it will then fall back to CBR seeking? And would the timestamp of the current playback position be auto-corrected when the more accurate index-based seektable eventually catches up to that timestamp?

As for what ExoPlayer could ideally do to make things easier for the developer, I do like the way seek works in Apple's AV Foundation framework because the seek accuracy is actually a parameter of the seek operation itself. Sure, it may still make sense to have an API to start the building of the accurate seek table in advance using index-based seeking if an app knows it wants that, but I think ExoPlayer could in theory also lazily build up a similar seek table and do a more efficient job of building it on demand, making use of all information available to it. e.g. if an MP3 does provide a lowres seek table, that could be leveraged as a lattice to which more accurate fragments of the seek table can be built and attached on demand. And in the absence of that lattice, I think ExoPlayer could still try to automatically build up an accurate seek table in fragments/islands based on which parts of the audio get loaded, and those islands could connect over time snapping them into alignment, improving accuracy over time. I don't know exactly what other audio player libraries do, but there are clearly some clever things that they are doing to make the seek experience more generally accurate for MP3 files.

Regarding the original use case, if we take a look at podcast players, they take arbitrary URLs outside the control of the app developer, and the vast majority of these are MP3 files. And one of the newer developments in podcast apps is that podcast episodes now come with chapter timestamps allowing some players to be built that can jump around to specific points where the host introduces particular topics. For this type of feature, I think it would be especially helpful to have some sort of heuristic that does a decent job for long audio files, so if there is a way to guess whether an audio file is "probably" CBR, that would certainly be a useful thing to detect. However, as I mentioned above, the only way to figure out that the audio file is probably CBR, you would need to wait until after some of the audio has started loading, so any ExoPlayer API that would allow me to act on this shouldn't simply be a parameter on the factory that creates the extractor as that's too early. Having this option as a parameter of the seek method itself would be more useful.

christosts commented 2 years ago

I want to make sure I understand this: does that mean enabling both for an MP3 file is equivalent to just enabling index-based seeking alone because in either case, index-based seeking will inevitably be applied regardless of the other option being enabled?

Yes, if both options are enabled, then for MP3 files index-seeking will be used entirely and the CBR option will be ignored.

Thank you for the input on seeking and audio files, that's very useful and we will take it into consideration when prioritizing items. I removed the low-priority label, but this enhancement is not scheduled yet. I'll update this issue if/when we find time to work on this.

ojw28 commented 2 years ago

And in the absence of that lattice, I think ExoPlayer could still try to automatically build up an accurate seek table in fragments/islands based on which parts of the audio get loaded, and those islands could connect over time snapping them into alignment, improving accuracy over time.

Just to clarify, this is not possible unless the MP3 file provides some kind of low-res seek table / lattice. The whole problem with MP3 files is that frames don't contain absolute timestamps. So if you start loading an MP3 in the middle, you have no way of determining the absolute timestamp you're loading from. Hence you don't know the starting timestamp of the fragment/island. The only two ways you can know the accurate starting timestamp (and hence the accurate timestamp of any position within the fragment/island) are:

You've already loaded all of the stream between the position you're at and the start of the file. In this case you do have an accurate grounding point, because you know the start of file corresponds to absolute timestamp 0. It's not really a fragment/island at this point, however.
You've got a lowres seek table that tells you the absolute timestamp.

In all other cases you have to start making assumptions that may not be true (e.g., that the bitrate is constant in part of the file you haven't loaded yet).

[...] But there are clearly some clever things that they are doing to make the seek experience more generally accurate for MP3 files

I'm not that convinced by this statement, although would happily be proven wrong! If I had to guess, I suspect they're just doing index based seeking, because that's fundamentally the only way to accurately seek into an MP3 file that doesn't contain something like a Xing header to assist with seeking. I suspect the difference is just that other players may be a lot more aggressive about buffering and indexing the entire file at the start of playback, so that the index is already built by the time the user seeks. It seems a lot more plausible to me that players would do this, rather than some complicated heuristic based schemes that fundamentally cannot work accurately in all cases.

ryanheise commented 2 years ago

So if you start loading an MP3 in the middle, you have no way of determining the absolute timestamp you're loading from.

That is why I am suggesting that you could represent these islands with relative timestamps with an assumed/approximate starting timestamp, and at the moment islands become connected, they could snap into alignment with each other, i.e. those initial approximate assumptions could be corrected. So for example:

xxxx-island-1-xxxx........................xxxxxx-island-2-xxxxxxxx...............

island-1 will have the correct initial offset (i.e. zero).

If the user seeks to island 2, you could build a fragment of the seek table there with a best guess approximation of it's starting point timestamp. Once island-1 grows and connects with island 2, then island-2 can snap to its correct offset.

ojw28 commented 2 years ago

Right, but the case you're describing can only happen if the player has already given the user an inaccurate seek (corresponding to the point in time when it initially started buffering island-2). I'm not sure there's value in implementing a complicated scheme that provides eventual accurate seeking if you seek around enough to load the whole file:

It's arguably more confusing, because the user can seek to the exact same position that they've already done a seek to, and get a different part of the audio file the second time than the first time (if correction has taken place in-between the attempts).
I don't think a typical use case would ever involve the user seeking backward to fill in the gap, which is what you need for correction to occur. If the user is trying to seek to the start of island-2, that's probably where they want to play from. They're not going to seek back to the end of island-1 to fill in the gap, because that's not the part of the file they're interested in playing. The player could proactively fill in the gap behind the scenes, but that's not really too different to more aggressively indexing the file from the start, which is what I'm suggesting other players are probably doing (the benefit of being able to perform a fast inaccurate seek prior to indexing completing seems marginal to me, and so the main difference is probably just that it's much more complicated to implement).

ryanheise commented 2 years ago

I don't think a typical use case would ever involve the user seeking backward to fill in the gap, which is what you need for correction to occur.

Unless perhaps two decoders were used, one for the leftmost island to continue expanding, and a secondary decoder invoked whenever the user seeks outside the known seek table.

PaulWoitaschek commented 1 year ago

Is there an update or even ETA on this? This is the most frequent complaint on my audiobook player.

christosts commented 1 year ago

I'm afraid not, this is not in our plans at the moment

google / ExoPlayer

Apply best seeking strategy for MP3 #9408