moonlight-stream / moonlight-embedded

Gamestream client for embedded systems
https://github.com/moonlight-stream/moonlight-embedded/wiki
GNU General Public License v3.0
1.49k stars 324 forks source link

Moonlight occasionally stuck in video decoder clean-up stage on disconnection #763

Closed aandaluz closed 1 year ago

aandaluz commented 4 years ago

NVidia Geforce Experience version: 3.20.0.118

Moonlight Embedded version: 2.4.10+master-243ef8a

Moonlight Embedded source: https://github.com/irtimmer/moonlight-embedded

Moonlight Embedded running on: Raspberry Pi 3 Model A+

Moonlight Embedded running on distribution: Raspbian Stretch (Debian 9) (w/ XFCE desktop enviroment, no opengl acceleration)

Verbose output -verbose of Moonlight Embedded: N/A (see attachments and linked youtube videos)

What is the expected result? Moonlight should disconnect gracefully from remote host, return mouse and keyboard input to the host and kill moonligth process.

What happens instead of that? Keyboard and mouse input is blocked and the desktop does not show back.

I've been trying to debug a weird issue that happens randomly in my raspberry pi config. Sometimes when I disconnect from the remote host (Windows 10, GFE 3.20) via keyboard ( CTRL+ALT+SHIFT+Q) or joystick (Play+Back+LeftShoulder+RightShoulder ) the screen is frozen and I cannot use the keyboard or mouse to interact with my XFCE desktop. adding the -quitappafter flag does not fix this issue.

To show this bug I have prepared a simple environment. I have two fullhd monitors, one connected to my Windows 10 PC (the left one) and another connected to the raspberry pi (the right on)

The following videos (recorded with my smartphone) show how I reproduced the bug

Initial Freeze

in video "1-freeze", I connect to the remote host which is running a steam game (Wonderboy in this example). I interact with the game via a xbox gamepad during a few seconds. At 00:15 I manually disconnect from the remote host and inmediatety connect back to the remote windows host at 00:18 Then about 00:22 i disconnect again via keyboard. The result is that the screen is frozen in the monitor that is connected to the raspberry pi (right monitor). Left monitor (windows 10 PC) is still running the game normally.

Connecting via VNC to diagnose the problem

In the raspberry pi I have installed a VNC server. So, In the second video "2-screen_vnc_status" I have connected to the raspberry pi via VNC using the Windows 10 PC as a client. In the VNC client window I see that moonlight has received a termination request, but it has not shut down. Next, at 00:06 I move the smartphone camera towards the right screen to show that the monitor is still frozen showing the last video frame from the game.

Killing moonlight manually

Then, in video "3-status_and_kill" I close the terminal from which I had launched moonlight via VNC. The right screen is still frozen.

Next, at 00:20 I open another terminal in the raspberry pi and list all running processes ps -ux |grep moonlight.

About 00:50 I run strace in and it shows that it has spent most of the time idle waiting for an externa status update (99 nanosleep + 4 memprotect syscalls). Googling around it looks like the moonlight process is in Interruptible sleep (waiting for an event to complete)

Finally i I kill moonlight process with kill -9 <moonlighpid>

(sample strace from another run) strace.txt

Next run does not crash

Finally, in video "4-next_launch" I have closed the VNC session in my windows 10 PC and I have recovered mouse and keyboard input after killing the moonlight process. The left monitor shows wonderboy still running in windows 10. Then I launch the remote stream via moonlight and I see that the video is runnning again. At 00:19 I disconnect again and everything stops properly.

Patch proposal

I have debuged the issue during these days, putting logs everywhere and it seems that the cleanup does not finish since the PI OMX video decoder is stuck in the decoder_renderer_cleanup (/src/video/pi.c), specifically after the disable_port_buffer command:

(sample output from another run) console_output.txt

Looking at the code, if I disable this section of code by placing the critical code under a #idfdef ENABLE_MOONLIGHT_ORIGINAL_PI_ADDITIONAL_CLEANUP_CODE .... #endif directrive) then the random freezing issue is gone.

static void decoder_renderer_cleanup() {
  int status = 0;

  OMX_BUFFERHEADERTYPE *buf;
  if((buf = ilclient_get_input_buffer(video_decode, 130, 1)) == NULL){
    fprintf(stderr, "Can't get video buffer\n");
    exit(EXIT_FAILURE);
  }

  buf->nFilledLen = 0;
  buf->nFlags = OMX_BUFFERFLAG_TIME_UNKNOWN | OMX_BUFFERFLAG_EOS;

  if(OMX_EmptyThisBuffer(ILC_GET_HANDLE(list[0]), buf) != OMX_ErrorNone){
    fprintf(stderr, "Can't empty video buffer\n");
    return;
  }

//this code is not built by default
//we have to skip these calls to avoid a bug where the
//moonlight client might ocasionally not disconnect properly 
//and leave both the host and the remove server in an undefined state.
//in this state the raspberry does not react to mouse or keyboard press events
//but the OS is still running. Moonlight can be killed via ssh using kill -9 (pid)
//note that upgrading firwmare to the lastest debian kernel (4.19.79 at this time)
//does not fix the issue.

#ifdef ENABLE_MOONLIGHT_ORIGINAL_PI_ADDITIONAL_CLEANUP_CODE
    fprintf(stderr, "decoder_renderer_cleanup (pi) flush renderer \n");
  // need to flush the renderer to allow video_decode to disable its input port
  ilclient_flush_tunnels(tunnel, 0);
    fprintf(stderr, "decoder_renderer_cleanup (pi) disable port buffer \n");  
ilclient_disable_port_buffers(list[0], 130, NULL, NULL, NULL);
fprintf(stderr, "decoder_renderer_cleanup (pi) disable tunnel \n");

  ilclient_disable_tunnel(tunnel);
   fprintf(stderr, "decoder_renderer_cleanup (pi) teardown tunnel \n");
  ilclient_teardown_tunnels(tunnel);
    fprintf(stderr, "decoder_renderer_cleanup (pi) state transition (idle)\n");
  ilclient_state_transition(list, OMX_StateIdle);
    fprintf(stderr, "decoder_renderer_cleanup (pi) state transition (loaded)\n");
 ilclient_state_transition(list, OMX_StateLoaded);
    fprintf(stderr, "decoder_renderer_cleanup (pi)cleanup compontets\n");
  ilclient_cleanup_components(list);
   fprintf(stderr, "decoder_renderer_cleanup (pi) OMX deinit\n");
#endif
  OMX_Deinit();

  ilclient_destroy(client);

}

I can do a pull request placing this code under this #ifdef so that it is not built by default. However, since OMX is the GPU video decoder backend I am not sure if this "brute approach" of skipping decoder clean-up stages has an impact in the system or moonlight. So far I have forked the code in a private repository but it would be great to contribute back if it's a valid fix for the community.

aandaluz commented 1 year ago

since we did not receive feedback in the pr, we have decided to close this ticket. We will keep updaing our own fork in the meantime