mpv-player / mpv

🎥 Command line video player
https://mpv.io
Other
27.85k stars 2.87k forks source link

libmpv: Severe screen corruption when rendering video via mpv_render_context_render to virtual x-server #14577

Open deeptho opened 1 month ago

deeptho commented 1 month ago

mpv Information

This is the first version in which the problem can be reproduced.
Found by bisecting. The problem also happens in master (3ab989e554)
mpv --version
mpv bad1-dirty Copyright © 2000-2023 mpv/MPlayer/mplayer2 projects
 built on Jul 18 2024 13:05:23
libplacebo version: v6.338.2
FFmpeg version: 6.1.1
FFmpeg library versions:
   libavutil       58.29.100
   libavcodec      60.31.102
   libavformat     60.16.100
   libswscale      7.5.100
   libavfilter     9.12.100
   libswresample   4.12.100

Other Information

Reproduction Steps

The problem can be reproduced using xpra -start xpra in seamless mode, e.g., starting a terminal which runs on a remote computer. Under the hood this starts an Xserver with Xdummy or Xvfb. The problem occurs with both. -in this terminal start a program that uses mpv-lib For instance https://github.com/v0idv0id/MPVideoCube.git Or (more difficult to compile: https://github.com/deeptho/neumodvb

Sometimes the video displayed in the programs looks ok, but sometimes video is heavily corrupted. Investigation shows

  1. If the programs are run under virtualgl on the remote computer, all is fine
  2. If the programs are run directly, they sometimes show the expected output: for video cube this means (mostly) artefact free video displayed on a cube. For neumodvb, this means a live tv channel showing artefact free video. However, sometimes the video is completely black, or heavily corrupted: the video contains vertical/horizontal lines, or only small parts of it appear on screeen, or it looks heavily pixellated. See also https://github.com/Xpra-org/xpra/issues/4300 for examples
  3. If screen corruption occurs, it continues to occur until a new video is displayed. If no corruption occurs at the start, then the video remains good for ever.

Expected Behavior

Non-corrupted video

Actual Behavior

Corrupted video.

Additional info:

  1. If an overlay is drawn on top of the video (neumodvb), after mpv renders it, that overlay looks fine. Both programs also run fine natively, not under xpra
  2. If in neumodvb I save the video rendered by mpv-lib, that video is also corrupted, but the overlay is not, suggesting strongly that mpv is causing the corruption
  3. The mpv command line client does not show corruption when playing videos
  4. The ONLY difference between the last good and first working mpv version seems to be a difference in default interpolation code, but that may just "trigger" the problem, rather than being the cause.
  5. Once the video displayed is corrupted, the corruption stays of the same type, although resizing. the window has some effect on the details of the corruption. So to reproduce the problem, multiple trials may be needed.

Please see the sreenshots here: https://github.com/Xpra-org/xpra/issues/4300

I cannot attach log files, as there are none in this use case. Or is it possible to start one in libmpv?

Log File

xxx.log

Sample Files

withosd aaa

I carefully read all instruction and confirm that I did the following:

sfan5 commented 1 month ago

I cannot attach log files, as there are none in this use case. Or is it possible to start one in libmpv?

There absolutely is. Set the "log-file" option via libmpv.

deeptho commented 1 month ago

I have added mpv-log-file=/tmp/mpv/log to the mpv.conf that is being loaded by libmpv in neumodvb, but it has no effect. whereas other options in that file, e.g. screenshot-directory=/tmp/screenshots work as expected

Akemi commented 1 month ago

I have added mpv-log-file=/tmp/mpv/log to the mpv.conf that is being loaded by libmpv in neumodvb, but it has no effect. whereas other options in that file, e.g. screenshot-directory=/tmp/screenshots work as expected

it's not mpv-log-file=, it's log-file=.

deeptho commented 1 month ago

Here is an mpv log file while the problem occurs.

  1. I start neumodvb
  2. I start displaying channel 4. There is audio but nothing is displayed
  3. I stop playback
  4. I start it a again. This time there is video but corupted by black horizontal and vertical lines. This is with git version c172a650c4 , which is the first version in which I can reproduce the corruption. libplacebo is at version 64c19545

mpv.log

This screenshot shows the corruption problem

kasper93 commented 1 month ago

Duplicate of #13998

sfan5 commented 1 month ago

Duplicate of #13998

Are you sure? I don't see gpu-next being used here.

deeptho commented 1 month ago

I have tried adding correct-downscaling=no to the mpv configuration With only 5 trials I notice that

  1. The vertical/horzontal lines on 16:9 content do not seem to appear
  2. The problem that the screen remains black (no video) at the first trial is still there
  3. Other forms of corruption are also still there. See picture below. The strange thing is that these corruptions do not occur at each trial, so it must have something to do with initialisation. Note that the I did not resize the window manually, so the scaling is always the same.

Adding profile=fast produced no pictures at all

bad1

kasper93 commented 1 month ago

Duplicate of #13998

Are you sure? I don't see gpu-next being used here.

I'm not sure what's going on here. I think we are looking at multiple different issues. For example the screenshot from https://github.com/mpv-player/mpv/issues/14577#issuecomment-2251558243 shows corruption that happens with Intel when using gather. But indeed previous report was about Windows and gpu-next. Though the symptoms are the same. First broken commit https://github.com/mpv-player/mpv/commit/c172a650c41a28d77d14de4af398cfd90caaa805 makes it clear we have some issue when downscaling, which is the same case as in the other issue.

The vertical/horzontal lines on 16:9 content do not seem to appear

Ok, so it seems to confirm that at least part of the problem is the same as the other one.

Adding profile=fast produced no pictures at all

That's worrying, because in this mode, we really don't do much work.

[   0.014][v][libmpv_render] GL_VERSION='4.5 (Compatibility Profile) Mesa 24.1.2'
[   0.014][v][libmpv_render] Detected desktop OpenGL 4.5.
[   0.014][v][libmpv_render] GL_VENDOR='Mesa'
[   0.014][v][libmpv_render] GL_RENDERER='llvmpipe (LLVM 18.1.6, 256 bits)'
[   0.014][v][libmpv_render] GL_SHADING_LANGUAGE_VERSION='4.50'

Are you able to test with older mesa build? I'm curious if those issues are new or were there before.

sfan5 commented 1 month ago

It would also be helpful to link the code in the application where mpv is integrated. The GL rendering has some constraints and there's a lot that can go wrong.

deeptho commented 1 month ago

In neumoDVB, this is the source file that handles libmpv callbacks https://github.com/deeptho/neumodvb/blob/master/src/viewer/neumompv.cc Note that depending on the choices of the user, this code also draws an overlay on top of mpv, but the issue of this ticket happens also without that overlay drawing.

Re the constraints: I am aware of those, although it is not always easy to understand them correctlt: a long time ago, I also had to make some changes to prevent the whole program from crashing when more than 2 mpv playbacks were used simultaneously. This happened after some silent change in GL (but I found some comment in a GL source file).

The culprit then turned out to be illegal access from multiple threads to the same GL context. This used to work fine (of course user code has to guard with locks to prevent concurrent access), but I think now the context can only be used by the thread that created it.

If you are wondering about the convoluted construct with the thread_local variable to store the context: it is needed to solve this problem. One of the problems was that libmpv uses different threads for the callbacks made by different video playbacks and the user code has to detect when it is called from two different threads.

Regarding the issue of this ticket, this is not relevant, as only one playback is running in the tests.

I found the limbmpv docoumentation you link to a bit misleading: "This assumes the OpenGL context lives on a certain thread controlled by the

sfan5 commented 1 month ago
  • it is libmpv that creates and controls the tread, not the api user. The api user indeed controls the context but not the thread and should be prepared for suddenly being called from a different thread.

This is incorrect. mpv will call the update callback on any thread it wants, but you must consistently use mpv_render_context_render on the thread that has the OpenGL context. You can see in this example how it's done with an event and on_mpv_render_update does not itself call any render functions.

Looking at neumompv.cc you seem to be doing this correctly.

sfan5 commented 1 month ago

In any case it should be easy to reproduce this bug with one of the mpv examples.

deeptho commented 1 month ago
  • it is libmpv that creates and controls the tread, not the api user. The api user indeed controls the context but not the thread and should be prepared for suddenly being called from a different thread.

This is incorrect. mpv will call the update callback on any thread it wants, but you must

Yes, that is what I wrote: "it is libmpv that creates and controls the tread, not the api user."

consistently use mpv_render_context_render on the thread that has the OpenGL context.

That is the thread calling the user callback, so an mpv thread and not controlled by the user. Is there any guarantee that for the same video playback, mpv calls the callback always on the same thread to render successive frames?

Otherwise it will get really complicated, as the user code callback can not draw but instead would have to delegate this task to some other thread, which would create needless context switches.

You can see in this example how it's done with an event and on_mpv_render_update does not itself call any render functions.

No, I did not claim that it mpv draws. The user callback draws, but it does that in a thread created by mpv. The surprising bit was that mpv calls from multiple threads for multiple simultaneous video playbacks and that it is then not possible to use the same GL context even when locking to prevent simultaneous access.

I can understand why libmpv would call from a seperate thread for each video playback, but it would be helpful to mention this in the documentation, along with a warning that openGL then requires using a separate GL context per thread (it did not require that in older versions).

Looking at neumompv.cc you seem to be doing this correctly.

Thanks for that confirmation.

sfan5 commented 1 month ago

That is the thread calling the user callback, so an mpv thread and not controlled by the user. Is there any guarantee that for the same video playback, mpv calls the callback always on the same thread to render successive frames?

No. Why would you need that?

Otherwise it will get really complicated, as the user code callback can not draw but instead would have to delegate this task to some other thread

Yes. This is what you have to do and just how the sdl example I linked works.

The user callback draws, but it does that in a thread created by mpv.

No, this is the exact opposite of what I said. You create the OpenGL context and control the draw thread. mpv calls the callback to tell you that you should draw. Do not draw inside the mpv callback, that's broken.

but it would be helpful to mention this in the documentation, along with a warning that openGL then requires using a separate GL context per thread

https://github.com/mpv-player/mpv/blob/acc69e082fff67398834de3045ef48d33d2f4d54/libmpv/render_gl.h#L31-L40

deeptho commented 1 month ago

It seems I was confused by some older, dead code in neuomdvb. The rendering indeed takes place on a thread created by neumodvb, not by libmpv.