google / ExoPlayer

This project is deprecated and stale. The latest ExoPlayer code is available in https://github.com/androidx/media
https://developer.android.com/media/media3/exoplayer
Apache License 2.0
21.74k stars 6.03k forks source link

Possible file descriptor leaks in local file playback #5291

Closed unext-wendong closed 5 years ago

unext-wendong commented 5 years ago

The behavior and the use case is similar to: https://github.com/google/ExoPlayer/issues/1066. However it happens on the current latest version r2.9.2 + Android 7.0 (original found it using r2.8.2 + Android 7.0).

In our use case, we want to playback a set of small local mp4 files in a loop. During our test, we noticed that after around 1 day's running, either the playback freezes or the app crashes and restarts. From the log, we can see the Too many open files error.

We then confirmed that it also happens on the ExoPlayer demo app + local contents. To reproduce it:

  1. push any mp4 or ismv content into the sdcard
  2. add the file URI into media.exolist.json
  3. start the playback of this local content and stop it after a few seconds of playback
  4. repeat step 3 and monitor the number of the opened files of the process using lsof -p <pid> | sort | wc -l during playback.

You should be able to see the number of opened files increases overtime. It decreases from time to time, but the lowest number and highest number both increase overtime. You should be able to notice it within several playbacks.

Don't think it's content specific, but we confirmed that using the following ismv file reproduces this issue (download it and play from local): http://playready.directtaps.net/smoothstreaming/SSWSS720H264/SuperSpeedway_720_2056.ismv

We confirmed that it doesn't seem to reproduce on streaming contents. Tried the Google Glass (MP4,H264) in the demo app and didn't reproduce it, i.e. the number of opened files stays around a fixed number across different playbacks.

By checking the output from lsof -p <pid> | sort, we suspect it's the following two items that were leaked since their number were keeping increasing:

exoplayer 27085     u0_a49  185r     0000                0,8       0t0       5736 anon_inode:sync_fence
exoplayer 27085     u0_a49  188u     0000                0,8       0t0       5736 anon_inode:dmabuf

This could be an issue of the Android framework. But if that's the case, I don't know how to explain why it doesn't reproduce on streaming contents.

Don't think it's useful but this is one of logs when the playback freezes.

12-21 06:42:22.520  1846 27180 I ACodec  : dequeued unrecognized (stale) buffer 0x95378728. discarding
12-21 06:42:22.521  1846 27180 I ACodec  : dequeued unrecognized (stale) buffer 0x953787c8. discarding
12-21 06:42:22.522  1846 27180 E HAL     : load: module=/vendor/lib/hw/gralloc.bigfish.so
12-21 06:42:22.522  1846 27180 E HAL     : dlopen failed: library "/vendor/lib/hw/gralloc.bigfish.so" not found
12-21 06:42:22.522  1846 27180 E [Gralloc-ERROR]: int gralloc_backend_register(private_handle_t *):56 Could not get gralloc module for handle: 0xa5159f80
12-21 06:42:22.522  1846 27180 W GraphicBufferMapper: registerBuffer(0xa5159f80) failed -24 (Too many open files)
12-21 06:42:22.522  1846 27180 E GraphicBuffer: unflatten: registerBuffer failed: Too many open files (-24)
12-21 06:42:22.522  1846 27180 E Surface : dequeueBuffer: IGraphicBufferProducer::requestBuffer failed: -24
12-21 06:42:22.522  1846 27180 E ACodec  : dequeueBuffer failed: ??(-24).

Please share if there is anything helpful to identify the cause.

erdemguven commented 5 years ago

I found someone reported a similar issue for Exoplayer 2.8.1.

unext-wendong commented 5 years ago

I think I got closer to the root cause. It seems to be related to the timing when the surface of the previous window (i.e. the window before the player window) is destroyed.

By comparing the log of local playback and streaming playback, one difference is that the destroy of the preview window surface (in case of the demo app, it's the SampleChooserActivity's surface) happens after the initialization of the native decoder in case of local playback, while before it in case of streaming playback.

Local playback:

12-27 16:24:43.823  1792  4156 I OMXMaster: makeComponentInstance(OMX.hisi.video.decoder.avc) in mediacodec process
...
12-27 16:24:43.869  2625  2647 I ActivityManager: Displayed com.google.android.exoplayer2.demo/.PlayerActivity: +151ms
...
12-27 16:24:43.893 10624 10929 I ACodec  : [OMX.hisi.video.decoder.avc] will output to SurfaceView
12-27 16:24:43.893 10624 10929 I ACodec  : Output will queue to overlay.
...
12-27 16:24:43.934  1792 10930 D HISI_OMXVDEC: [OmxState] Execute_Pending --> Executing
...
12-27 16:24:43.980  2625  2862 I WindowManager: Destroying surface Surface(name=com.google.android.exoplayer2.demo/com.google.android.exoplayer2.demo.SampleChooserActivity) called by com.android.server.wm.WindowStateAnimator.destroySurface:2016 com.android.server.wm.WindowStateAnimator.destroySurfaceLocked:882 com.android.server.wm.WindowState.destroyOrSaveSurface:2100 com.android.server.wm.WindowManagerService.tryStartExitingAnimation:3063 com.android.server.wm.WindowManagerService.relayoutWindow:2943 com.android.server.wm.Session.relayout:224 android.view.IWindowSession$Stub.onTransact:286 com.android.server.wm.Session.onTransact:145 

Streaming case:

12-27 16:26:03.566  2625  2647 I ActivityManager: Displayed com.google.android.exoplayer2.demo/.PlayerActivity: +123ms
12-27 16:26:03.631  2625  2959 I WindowManager: Destroying surface Surface(name=com.google.android.exoplayer2.demo/com.google.android.exoplayer2.demo.SampleChooserActivity) called by com.android.server.wm.WindowStateAnimator.destroySurface:2016 com.android.server.wm.WindowStateAnimator.destroySurfaceLocked:882 com.android.server.wm.WindowState.destroyOrSaveSurface:2100 com.android.server.wm.WindowManagerService.tryStartExitingAnimation:3063 com.android.server.wm.WindowManagerService.relayoutWindow:2943 com.android.server.wm.Session.relayout:224 android.view.IWindowSession$Stub.onTransact:286 com.android.server.wm.Session.onTransact:145 
...
12-27 16:26:03.823  1792  1934 I OMXMaster: makeComponentInstance(OMX.hisi.video.decoder.avc) in mediacodec process
...
12-27 16:26:03.892 10624 10998 I ACodec  : [OMX.hisi.video.decoder.avc] will output to SurfaceView
12-27 16:26:03.892 10624 10998 I ACodec  : Output will queue to overlay.
...
12-27 16:26:03.912  1792 10999 D HISI_OMXVDEC: [OmxState] Execute_Pending --> Executing

This is understandable since the loading of the media source in local case is faster.

I then tried adding a delay of 500ms in the DefaultDataSource.open method when FileDataSource is used in file DefaultDataSource.java to try to delay the initialization of the decoder to be after the previous surface's destroy.

With this change, the issue seems to be gone.

It now looks more like an issue of the Android media framework, e.g. the ownership of some resources is somehow mishandled in this scenario.

unext-wendong commented 5 years ago

FYI: the change I tried:

diff --git a/library/core/src/main/java/com/google/android/exoplayer2/upstream/DefaultDataSource.java b/library/core/src/main/java/com/google/android/exoplayer2/upstream/DefaultDataSource.java
index 6504562..cce878d 100644
--- a/library/core/src/main/java/com/google/android/exoplayer2/upstream/DefaultDataSource.java
+++ b/library/core/src/main/java/com/google/android/exoplayer2/upstream/DefaultDataSource.java
@@ -246,6 +246,15 @@ public final class DefaultDataSource implements DataSource {
     } else {
       dataSource = baseDataSource;
     }
+    if (dataSource instanceof FileDataSource && dataSpec.position <= 0) {
+      long delayMs = 500;
+      android.util.Log.i("DefaultDataSource", "delay initial open for " + delayMs + " ms");
+      try {
+        Thread.sleep(delayMs);
+      } catch (Throwable th) {
+        // Do nothing
+      }
+    }
     // Open the source and return.
     return dataSource.open(dataSpec);
   }
tonihei commented 5 years ago

@erdemguven What's the latest conclusion on this issue?

ojw28 commented 5 years ago

@unext-wendong - Is this issue device specific, or is it happening on all devices? If you could try and reproduce on a couple of recent flagship devices, that would be very helpful.

unext-wendong commented 5 years ago

@ojw28 I was using a Huawei STB of a model similar as the following one. https://consumer.huawei.com/jp/support/set-top-box/tt01/ https://www.androidtv-guide.com/pay-tv-provider/ntt-docomo-tv-terminal

Don't remember if I tried it on any other devices. Now you mention it, it could be related to the media framework of that device, if no one else sees the similar behavior on other devices.

Will try it when I get some time. Since the reproduce procedure is quite simple, you can try it on your side if you don't want to wait.

ojw28 commented 5 years ago

Marking as "need more info" as I don't think there's anything actionable for us to do here at this point.

google-oss-bot commented 5 years ago

Hey @unext-wendong. We need more information to resolve this issue but there hasn't been an update in 14 days. I'm marking the issue as stale and if there are no new updates in the next 7 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

unext-wendong commented 5 years ago

@ojw28 Seems to be a problem of the framework in this device. After a firmware update, I couldn't reproduce the issue any more.

Also, I realized that the command I mentioned in my issue description lsof -p <pid> won't work on non-rooted devices. The device I was using to reproduce the issue was with an eng system image.