google / ExoPlayer

This project is deprecated and stale. The latest ExoPlayer code is available in https://github.com/androidx/media
https://developer.android.com/media/media3/exoplayer
Apache License 2.0
21.71k stars 6.02k forks source link

SimpleCache corruption debugging and cleanup #10773

Open Tolriq opened 1 year ago

Tolriq commented 1 year ago

So this is a question with multiple points around ExoPlayer simple cache.

I don't have a repro and a way to trigger the source cause of cache corruption but when it happens the player then gives:

Cache read: 64bc5440-66e7-4514-bad8-e14e5c795d83 - 36.0 kB/282.1 MB [B:327680-/music/King Gizzard & The Lizard Wizard/Changes/04 No Body.flac]
Error/ExoPlayer: internalError [eventTime=6444.95, mediaPos=0.00, window=1, period=1, loadError
  java.io.EOFException
      at com.google.android.exoplayer2.extractor.DefaultExtractorInput.readFromUpstream(DefaultExtractorInput.java:296)
      at com.google.android.exoplayer2.extractor.DefaultExtractorInput.skipFully(DefaultExtractorInput.java:107)
      at com.google.android.exoplayer2.extractor.DefaultExtractorInput.skipFully(DefaultExtractorInput.java:115)
      at com.google.android.exoplayer2.extractor.ogg.DefaultOggSeeker.readGranuleOfLastPage(DefaultOggSeeker.java:235)
      at com.google.android.exoplayer2.extractor.ogg.DefaultOggSeeker.read(DefaultOggSeeker.java:107)
      at com.google.android.exoplayer2.extractor.ogg.StreamReader.readPayload(StreamReader.java:200)
      at com.google.android.exoplayer2.extractor.ogg.StreamReader.read(StreamReader.java:123)
      at com.google.android.exoplayer2.extractor.ogg.OggExtractor.read(OggExtractor.java:88)
      at com.google.android.exoplayer2.source.BundledExtractorsAdapter.read(BundledExtractorsAdapter.java:127)
      at com.google.android.exoplayer2.source.ProgressiveMediaPeriod$ExtractingLoadable.load(ProgressiveMediaPeriod.java:1039)
      at com.google.android.exoplayer2.upstream.Loader$LoadTask.run(Loader.java:412)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
      at java.lang.Thread.run(Thread.java:1013)

The first line is details from onCachedBytesRead listener. So the cache read 36k and stops instead of returning more or switching to the upstream. But there's no other information about what is happening to actually debug more.

So that would be question 1, what can make the cache only read 36k and stops without trying to use the upstream and without logging anything?

The second question is about how can I properly recover from this? This will fails every time so requires a removal of that particular dataspec from the cache, but I can't figure out how to do that, all I find is clear the complete cache and that's not something I want to do if possible.

icbaker commented 1 year ago

Some initial questions:

I'm wondering if the upstream DataSource implementation is perhaps incorrectly handling an edge-case of the DataSource API contract. We semi-recently tightened up CacheDataSource to rely on correct handling in some of these cases, e.g. 3ebf94cd453842f40bc112ff13ea3f9ff871d183.

Tolriq commented 1 year ago

It's Exoplayer 2.18.1. In this case, the upstream datasource is a pseudo custom one as it's just a wrapper over a OkHttpDataSource allowing me more control over the okhttp instance used. That is wrapped over another simple datasource that allows to handle some very specific cases in the open function but without any impact in this case.

What I know from that user is that previously he had issues with his server during transcoding and would have error 502 thrown during playback sometimes (before introduction of the cache). But this should fail the cache of that block.

About the usage of the cache since there's currently no way to properly pre fill more including the next songs. I'm doing the cache manually in 2 pass to avoid blocking the player until the cache is filled.

 val spec = if (start) DataSpec(uri, 0, START_CACHE_SIZE) else DataSpec(uri)
icbaker commented 1 year ago

About the usage of the cache since there's currently no way to properly pre fill more including the next songs. I'm doing the cache manually in 2 pass to avoid blocking the player until the cache is filled.

 val spec = if (start) DataSpec(uri, 0, START_CACHE_SIZE) else DataSpec(uri)

Is START_CACHE_SIZE here equal to 36k (i.e. the threshold where the problem appears)?

Tolriq commented 1 year ago

Nope it's 256Kb or 512Kb. For the moment I only get one user with this particular issue. But I guess more could happen. The problem is that until the user fill the cache to evict I currently do not have a solution to offer to him.

I'll probably add a temporary button in the advanced settings to fully clear the cache to recover, but not very satisfying.

icbaker commented 1 year ago

So the cache read 36k and stops instead of returning more or switching to the upstream. But there's no other information about what is happening to actually debug more.

Reading the stack trace more carefully, I'm not sure that's necessarily what's happening here. It might be that the Ogg file is invalid (indicating a length that's longer than the actual file), and so when the extractor is trying to skip to the end of the file in DefaultOggSeeker.readGranuleOfLastPage it tries to read beyond the end of the file (even what's available from upstream) and receives the C.RESULT_END_OF_INPUT. I think without being able to repro there's a limited amount we can do here I'm afraid - but if you have access to the file causing the problem you could take a look and see if you can repro using that file.

Tolriq commented 1 year ago

That file worked without the cache and the users cleared the full app data without waiting and now it of course works with the cache :(

Let's hope either no one else report or someone with more patience or a repro, but in all cases my second question is still here.

Is there anyway to remove a dataspec / cachekey from cache without clearing the full cache?

Even if I warn users that that cache is not guaranteed and they should use the other way I provide to offline cache their media I have some users that expand the cache to up to 4Gb, if this happens again it's really not efficient to clear all. So I'd take even very hacky temporary solutions to be able to unblock them.

icbaker commented 1 year ago

@marcbaechinger - can you answer this bit?

Is there anyway to remove a dataspec / cachekey from cache without clearing the full cache?

Tolriq commented 1 year ago

@marcbaechinger small bump.

Tolriq commented 1 year ago

Bump again, @marcbaechinger This is really problematic as when there's as single file corruption in the cache there's no way to recover current and the user needs to clear the whole cache that can be big some users.

I'd really appreciate some help about how I can recover by removing that single broken source from the cache.

marcbaechinger commented 1 year ago

Sorry I'm not sure I understand the question.

The Cache interface has a method removeResource to remove all spans for a given resource. Is this method not working as expected or isn't it doing what you want?

  /**
   * Removes all {@link CacheSpan CacheSpans} for a resource, deleting the underlying files.
   *
   * @param key The cache key of the resource being removed.
   */
  @WorkerThread
  void removeResource(String key);
Tolriq commented 1 year ago

Well this was the question back in November and this is the answer I needed ;)

AS had an issue with showing the parent interface doc and SimpleCache nowhere have the concept of Resource, leading me to miss the function and trying to figure out how to get the spans and asking here.

So thanks, this solve the workaround problem.

I guess that without a repro there won't be anything possible on your side about the cache corruption itself and easier workaround than cache removal on playback failure?