sherpya / MPlayer

MPlayer SVN mirror including my patches - ALWAYS REBASED (so pull --rebase --ff-only)
http://oss.netfarm.it/mplayer-win32.php
GNU General Public License v2.0
22 stars 6 forks source link

sub/osd rendering causes crashes after r38135 #10

Open Fedyon opened 4 years ago

Fedyon commented 4 years ago

Reproducer: mplayer -noconfig all -nofontconfig -font mplayer\subfont.ttf -osdlevel 3 sample.avi

Last-known-good:

CPLAYER: MPlayer sherpya-r38135+gb272d5b9b6-8.3-win32 (C) 2000-2019 MPlayer Team OSD: Using MMX (with tiny bit MMX2) Optimized OnScreenDisplay

Anything since r38152 (including) seems broken:

CPLAYER: MPlayer sherpya-r38184+g13171ad2e3-9.3-win32 (C) 2000-2020 MPlayer Team OSD: Using SSE2 Optimized OnScreenDisplay ---snip--- CPLAYER: VO: Description: Directx DDraw YUV/RGB/BGR renderer CPLAYER: VO: Author: Sascha Sommer saschasommer@freenet.de OSD: Unicode font: 2993 glyphs. OSD: Unicode font: 2993 glyphs. OSD: OSD chg: 6 V: no pb:-1
OSD: OSD chg: 5 V: no pb:-1
OSD: OSD chg: 3 V: no pb:-1
OSD: OSD chg: 2 V: no pb:-1
OSD: OSD update: 20;10 368x60
OSD: OSD chg: 1 V: yes pb:-1
CPLAYER:

MPlayer interrupted by signal 11 in module: filter video

Systems:

CPUDETECT: CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz (Family: 6, Model: 60, Stepping: 3)

  • poor XP machine with brain-damaged igfx. (direct3d tears a lot, opengl keeps dying...)
  • All 32bit builds crashed in the same way.
  • k8-sse3 build running on AMD E1-1200 also crashed. (edited: I mistook, it crashed too)

Suspects:

Fedyon commented 4 years ago

The crash reproduced with -vf expand=::::1 -vo md5sum too. Perhaps -vo directx was not at fault, editing title.

hnsteyding commented 4 years ago

I could reproduce your observations too. Until r38142 everything is fine - from r38143 to the current release it does not work anymore. Maybe an additional message to http://lists.mplayerhq.hu/pipermail/mplayer-users/2020-May/thread.html could help ?

hnsteyding commented 4 years ago

Meanwhile, i've posted this bug to http://lists.mplayerhq.hu/pipermail/mplayer-users/2020-May/088907.html

hnsteyding commented 4 years ago

I've got feeback from mplayer developers. From MPlayer SVN-r38188 this problem should be fixed i could confirm this and have no problems anymore.

Could anyone clarify this ?

sherpya commented 4 years ago

please try r38188+g6e1903938b

Fedyon commented 4 years ago

Thanks everyone. Unfortunately, r38188 still crashes with the same reproducer.

In attempt to reduce factors, I found that the same crash can happen with -demuxer rawvideo -rawvideo w=640:h=480:format=yuy2 -vf expand=::::1 -vo null /some/lengthy/zero, but it seems less reliable (to crash). It sometimes just run, it sometimes just crashes.

Given that adding a few chars to the command line occasionally makes difference, just a wild guess: some more alignment failure on local stack/heap...?

sherpya commented 4 years ago

no unfotunately I need a debug build and gdb attached, did you managed to reproduce consistently with that command?

Fedyon commented 4 years ago

Well, consistency... perhaps yes. Yes, consistent, but not reliable. I mean...

Full command line: T:\TEMP\MPlayer-corei7-r38188+g6e1903938b>mplayer -noconfig all -nofontconfig -font mplayer\subfont.ttf -demuxer rawvideo -rawvideo w=640:h=480:format=yuy2 -vf expand=::::1 -vo null -osdlevel 3 ..\zero

By changing only the last argument (note the number of last dots), crash/try: ..\zero -> 10/10 ..\zero. -> 0/10 ..\zero.. -> 0/10 ..\zero... -> 10/10 ..\zero.... -> 0/10 ..\zero..... -> 10/10 ..\zero...... -> 10/10 ..\zero....... -> 10/10 ..\zero........ -> 10/10 ..\zero......... -> 0/10 ..\zero.......... -> 0/10 ..\zero........... -> 0/10 ..\zero............ -> 10/10

I bet it must also be dependent on cd/env/etc. and the crash pattern itself may not reproduce between machines.

hnsteyding commented 4 years ago

I've compiled mplayer r38188 with some patches from sherpya and create a gdb log file with option used by Fedyon mplayer.zip Maybe it helps.

sherpya commented 4 years ago

can someone ping reimar? I'm not subscribed anymore

hnsteyding commented 4 years ago

I've informed him.

rdoeffinger commented 4 years ago

I'm not sure these issues are related. That backtrace points to ff_deblock_h_chroma422_8_avx Which means it's not directly OSD related, and it's due to AVX code. Could be that the expand filter provides a destination buffer that isn't sufficiently aligned or something like that. But it could be something completely different as well, as the backtrace seems not really correct: 0x00fe1045 in ff_deblock_h_chroma422_8_avx () at mplayer.c:2390 Which is inconsistent, as the function ff_deblock_h_chroma422_8_avx is clearly not in the file mplayer.c

rdoeffinger commented 4 years ago

Also this happens only for 32-bit builds? Could try below patch, to see if it's really the new SSE2 OSD code. If so disabling it seems the easiest solution...

--- a/sub/osd.c
+++ b/sub/osd.c
@@ -63,8 +63,11 @@ static const unsigned long long mask24hl  __attribute__((aligned(8))) = 0x0000FF
 #endif

 #if HAVE_SSE2 || CONFIG_RUNTIME_CPUDETECT
+// crashes on win32 due to alignment issues of unclear cause
+#if !defined(_WIN32) || !ARCH_X86_32
 #define COMPILE_SSE2
 #endif
+#endif

 #endif /* ARCH_X86 */
hnsteyding commented 4 years ago

I aaply the patch. Now i've get compile errors at end of compiling. See attached file compile-errors.txt

rdoeffinger commented 4 years ago

Sorry, here's an actually tested patch. (txt extension because diff is not a supported file type for github!!) win32_sse2_osd.txt

hnsteyding commented 4 years ago

No problem. After apply your last patch mplayer don't crash anymore. I tested with the following options: 1) mplayer -noconfig all -nofontconfig -font mplayer\subfont.ttf -osdlevel 3 sample.mp4 2) mplayer -noconfig all -nofontconfig -font mplayer\subfont.ttf -osdlevel 3 -vf expand=::::1 -vo md5sum sample.mp4 3) mplayer -demuxer rawvideo -rawvideo w=640:h=480:format=yuy2 -vf expand=::::1 -vo null sample.mp4 4) mplayer -v dvd:///E: -menu -osdlevel 3 Could this be confirmed ?

hnsteyding commented 4 years ago

By the way: If i apply the last patch to r38186 mplayer also don't crash anymore.

rdoeffinger commented 4 years ago

That patch just disables the SSE2 optimization on 32-bit Windows MPlayer. If I could reproduce it and get a proper backtrace I might be able to come up with a better fix, but no luck so far. Possibly adding attribute((force_align_arg_pointer)) to the main function or similar would work even, but not sure. Specifically I mean the below instead of the last patch. But it might also not help at all.

--- a/mplayer.c
+++ b/mplayer.c
@@ -2789,7 +2789,7 @@ static int seek(MPContext *mpctx, double amount, int style)
 /* This preprocessor directive is a hack to generate a mplayer-nomain.o object
  * file for some tools to link against. */
 #ifndef DISABLE_MAIN
-int main(int argc, char *argv[])
+int main(int argc, char *argv[]) __attribute__((force_align_arg_pointer))
 {
     int opt_exit = 0; // Flag indicating whether MPlayer should exit without playing anything.
     int profile_config_loaded;
Fedyon commented 4 years ago

I believe the crash point is: https://github.com/sherpya/MPlayer/blob/fee36a534aee8fce8311c281ef049c3f51e9a789/sub/osd_template.c#L262 in particular the argument mmsrc, which is in turn https://github.com/sherpya/MPlayer/blob/fee36a534aee8fce8311c281ef049c3f51e9a789/sub/osd_template.c#L258

Disasm on MPlayer-corei7-r38188+g6e1903938b gave this:

66 0f d7 c8             pmovmskb ecx,xmm0
81 f9 ff ff 00 00       cmp    ecx,0xffff
0f 84 a6 00 00 00       je     0xb6
f3 0f 6f eb             movdqu xmm5,xmm3
f3 0f 6f 0c 42          movdqu xmm1,XMMWORD PTR [edx+eax*2]
66 0f 68 da             punpckhbw xmm3,xmm2
66 0f 60 ea             punpcklbw xmm5,xmm2
f3 0f 6f 44 42 10       movdqu xmm0,XMMWORD PTR [edx+eax*2+0x10]
f3 0f 6f 15 20 d0 9f 01 movdqu xmm2,XMMWORD PTR ds:0x19fd020
66 0f ef cf             pxor   xmm1,xmm7
f3 0f 6f 25 20 d0 9f 01 movdqu xmm4,XMMWORD PTR ds:0x19fd020
66 0f ef c7             pxor   xmm0,xmm7
f3 0f 6f f0             movdqu xmm6,xmm0
66 0f db d1             pand   xmm2,xmm1
66 0f e4 d5             pmulhuw xmm2,xmm5
66 0f db e0             pand   xmm4,xmm0
66 0f e4 e3             pmulhuw xmm4,xmm3
66 0f 71 e6 08          psraw  xmm6,0x8
66 0f e5 f3             pmulhw xmm6,xmm3
66 0f 67 d4             packuswb xmm2,xmm4
f3 0f 6f e1             movdqu xmm4,xmm1
66 0f fc 14 03          paddb  xmm2,XMMWORD PTR [ebx+eax*1]     ; <- FAULT
; eax=00000000 ebx=0B7ECD98 ecx=000083FF edx=0B943C68
; esi=00000170 edi=0B7F23F0 ebp=0022EC58 esp=0022EC20
66 0f 71 e4 08          psraw  xmm4,0x8
66 0f e5 e5             pmulhw xmm4,xmm5
66 0f 63 e6             packsswb xmm4,xmm6
f3 0f 6f f2             movdqu xmm6,xmm2
66 0f 68 d4             punpckhbw xmm2,xmm4
66 0f 60 f4             punpcklbw xmm6,xmm4

But I currently don't have proper development environment on this machine, and as such I looked up just by guess without gdb/symtbl, I might got wrong in the way...

Fedyon commented 4 years ago

Result of guess-stack-rewinding.

vo_draw_alpha_yuy2_sse( 00000170, 0000003C, 0B7ECAB8, 0B7F2110, 00000170, 0B943268, 00000500 )
vo_draw_text_ext( 00000280, 000001E0, 00000000, 00000000, 00000000, 00000000, 00000280, 000001E0, 00484010 )

That problematic src should have come from obj there, so I tracked:

*vo_osd_list [020E0440] = 0B7BB440
mp_osd_obj_t {
  +00000000 .next
  +00001984 .stride
  +00001988 .allocated
  +0000198C .alpha_buffer
  +00001990 .bitmap_buffer
}
(mp_osd_obj_t)[0B7BB440] = { .stride=00000000, .allocated=FFFFFFFF, .alpha_buffer=00000000, .bitmap_buffer=00000000 }
(mp_osd_obj_t)[0B7B9A90] = { .stride=00000000, .allocated=00000000, .alpha_buffer=0BB198B0, .bitmap_buffer=0BB19890 }
(mp_osd_obj_t)[0B7B80E0] = { .stride=00000000, .allocated=FFFFFFFF, .alpha_buffer=00000000, .bitmap_buffer=00000000 }
(mp_osd_obj_t)[0B7B6730] = { .stride=00000000, .allocated=FFFFFFFF, .alpha_buffer=00000000, .bitmap_buffer=00000000 }
(mp_osd_obj_t)[0B7B4D80] = { .stride=00000000, .allocated=FFFFFFFF, .alpha_buffer=00000000, .bitmap_buffer=00000000 }
(mp_osd_obj_t)[0B7B33D0] = { .stride=00000170, .allocated=00005640, .alpha_buffer=0B7F2110, .bitmap_buffer=0B7ECAB8 }

... and so now I'm very confused. If I read correctly those buffers are only allocated via https://github.com/sherpya/MPlayer/blob/fee36a534aee8fce8311c281ef049c3f51e9a789/sub/sub.c#L161 and there memalign is supposed to provide align, am I right?????? What's happening???????

Fedyon commented 4 years ago

Wait, something is wrong. I think I found the compiled alloc_buf, but in there supposed-to-be-memalign is actually just plain MSVCRT.malloc!

00522750 899388190000   mov    [ebx+00001988],edx
00522756 893C24         mov    [esp],edi
00522759 E88ABE2B01     call   MSVCRT.free
0052275E 8B838C190000   mov    eax,[ebx+0000198C]
00522764 890424         mov    [esp],eax
00522767 E87CBE2B01     call   MSVCRT.free
0052276C 893424         mov    [esp],esi
0052276F E80CBE2B01     call   MSVCRT.malloc
00522774 893424         mov    [esp],esi
00522777 898390190000   mov    [ebx+00001990],eax
0052277D 89C7           mov    edi,eax
0052277F E8FCBD2B01     call   MSVCRT.malloc
00522784 89838C190000   mov    [ebx+0000198C],eax
0052278A E960FFFFFF     jmp    MPLAYER.005226EF

edit: what is this https://github.com/sherpya/MPlayer/blob/fee36a534aee8fce8311c281ef049c3f51e9a789/configure#L4011

sherpya commented 4 years ago

@Fedyon

internal aligned malloc functions are implemented using malloc (with a trick) but it works

Fedyon commented 4 years ago

@sherpya I'm afraid it looks like the trick is not here, for whatever reason. I believe this is the whole of alloc_buf https://github.com/sherpya/MPlayer/blob/fee36a534aee8fce8311c281ef049c3f51e9a789/sub/sub.c#L150

005226B0 57             push   edi
005226B1 56             push   esi
005226B2 53             push   ebx
005226B3 89C3           mov    ebx,eax
005226B5 83EC10         sub    esp,00000010
005226B8 8B4020         mov    eax,[eax+20]
005226BB 8B5318         mov    edx,[ebx+18]
005226BE 39D0           cmp    eax,edx
005226C0 7C6E           jl     MPLAYER.00522730 (00522730)
005226C2 29D0           sub    eax,edx
005226C4 8B4B1C         mov    ecx,[ebx+1C]
005226C7 8B5324         mov    edx,[ebx+24]
005226CA 83C00F         add    eax,0000000F
005226CD 83E0F0         and    eax,FFFFFFF0
005226D0 39CA           cmp    edx,ecx
005226D2 7C6B           jl     MPLAYER.0052273F (0052273F)
005226D4 29CA           sub    edx,ecx
005226D6 0FAFD0         imul   edx,eax
005226D9 89D6           mov    esi,edx
005226DB 8BBB90190000   mov    edi,[ebx+00001990]
005226E1 898384190000   mov    [ebx+00001984],eax
005226E7 399388190000   cmp    [ebx+00001988],edx
005226ED 7C61           jl     MPLAYER.00522750 (00522750)
005226EF A158040E02     mov    eax,[020E0458]
005226F4 89742408       mov    [esp+08],esi
005226F8 893C24         mov    [esp],edi
005226FB 89442404       mov    [esp+04],eax
005226FF E84CBE2B01     call   MSVCRT.memset
00522704 A154040E02     mov    eax,[020E0454]
00522709 89742408       mov    [esp+08],esi
0052270D 89442404       mov    [esp+04],eax
00522711 8B838C190000   mov    eax,[ebx+0000198C]
00522717 890424         mov    [esp],eax
0052271A E831BE2B01     call   MSVCRT.memset
0052271F 83C410         add    esp,00000010
00522722 5B             pop    ebx
00522723 5E             pop    esi
00522724 5F             pop    edi
00522725 C3             ret
00522726 8DB42600000000 lea    esi,[esi]
0052272D 8D7600         lea    esi,[esi]
00522730 895320         mov    [ebx+20],edx
00522733 8B4B1C         mov    ecx,[ebx+1C]
00522736 31C0           xor    eax,eax
00522738 8B5324         mov    edx,[ebx+24]
0052273B 39CA           cmp    edx,ecx
0052273D 7D95           jnl    MPLAYER.005226D4 (005226D4)
0052273F 894B24         mov    [ebx+24],ecx
00522742 31D2           xor    edx,edx
00522744 31F6           xor    esi,esi
00522746 EB93           jmp    MPLAYER.005226DB (005226DB)
00522748 8DB42600000000 lea    esi,[esi]
0052274F 90             nop
00522750 899388190000   mov    [ebx+00001988],edx
00522756 893C24         mov    [esp],edi
00522759 E88ABE2B01     call   MSVCRT.free
0052275E 8B838C190000   mov    eax,[ebx+0000198C]
00522764 890424         mov    [esp],eax
00522767 E87CBE2B01     call   MSVCRT.free
0052276C 893424         mov    [esp],esi
0052276F E80CBE2B01     call   MSVCRT.malloc
00522774 893424         mov    [esp],esi
00522777 898390190000   mov    [ebx+00001990],eax
0052277D 89C7           mov    edi,eax
0052277F E8FCBD2B01     call   MSVCRT.malloc
00522784 89838C190000   mov    [ebx+0000198C],eax
0052278A E960FFFFFF     jmp    MPLAYER.005226EF

I see the align size value (=16) only in esp adjustment. No one is passing value 16 to anywhere, nothing is around calls to MSVCRT.malloc, those calls actually reaches to MSVCRT... or this MSVCRT is patched on memory or something like that? Note that the only rounding operation at 005226CA~CD corresponds to https://github.com/sherpya/MPlayer/blob/fee36a534aee8fce8311c281ef049c3f51e9a789/sub/sub.c#L155 and I'm pretty sure this is not related to *_buffer align.

edit: after some failures I managed to see breakpoint on 00522777 caught storing 0B7ECAB8, this is definitly not aligned.

rdoeffinger commented 4 years ago

The fallback in case of MPlayer's memalign use is actually plain malloc, so it does not actually work if the libc does not provide properly aligned memory. I attached the patch I just sent to the MPlayer mailing list which hopefully will fix this. 0001-Remove-all-usage-of-memalign.txt If you can confirm, I can re-enabled the optimization on Win32 again.

hnsteyding commented 4 years ago

I make a clean svn checkout from current release (r38191) and compile it. In this release mplayer didn't crashed anymore. The above patch does not make a difference in r38191 about crash problems - no mplayer crash.

If i compile r38190 (before the SSE2 optimization on 32-bit Windows MPlayer was disabled) and use the above patch in some cases mplayer crashes.

Fedyon commented 4 years ago

I just realized that I overlooked this, it certainly can break src alignment: https://github.com/sherpya/MPlayer/blob/fee36a534aee8fce8311c281ef049c3f51e9a789/libmpcodecs/vf_expand.c#L130 but I'm not sure what this path is doing at the first place...

(rewrote; dst didn't matter, we use unaligned stores)

rdoeffinger commented 4 years ago

While that expand code looks problematic, it should only trigger when you actually add borders with the expand filters, which none of the crashing example commands mentioned do... Also it would already break 8-byte alignment, so is problematic even with the MMX code I believe. So this doesn't really look like it should be the cause? As to what that path does: When the OSD did not change, there is no point in first clearing and then drawing it again onto the expanded black borders, instead the borders can simply be left as-is.

Fedyon commented 4 years ago

Ah I see, so adding side borders is breaking them all. Thanks for the explanation.

With MPlayer-corei7-r38188+g6e1903938b, I simulated the fix by setting breakpoints on 00522777 and 00522784 to make them retry malloc until it get an aligned memory (just by chance). I confirm the trick makes all the reproducers not to crash, and by disabling breakpoints they crash again. So I believe it is solving a large part of problems, at least.

(I'm severely running out of local disk space on the machine and can't have the build environment extracted. Yuck.)

rdoeffinger commented 4 years ago

I'm a bit unsure about the status, if there are still issues or not (when re-enabling the optimization). As to the vf_expand code that is just an optimization, so at the cost of slightly higher CPU usage when adding black borders and the OSD being drawn on those borders and not changing, the below should avoid any issues due to that specific code by disabling it:

--- a/libmpcodecs/vf_expand.c
+++ b/libmpcodecs/vf_expand.c
@@ -84,7 +84,7 @@ static void remove_func_2(int x0,int y0, int w,int h){
 }

 static void remove_func(int x0,int y0, int w,int h){
-    if(!vo_osd_changed_flag) return;
+    if(0 && !vo_osd_changed_flag) return;
     // split it to 4 parts:
     if(y0<vf->priv->exp_y){
        // it has parts above the image:
@@ -122,7 +122,7 @@ static void remove_func(int x0,int y0, int w,int h){

 static void draw_func(int x0,int y0, int w,int h,unsigned char* src, unsigned char *srca, int stride){
     unsigned char* dst;
-    if(!vo_osd_changed_flag && vf->dmpi->planes[0]==vf->priv->fb_ptr){
+    if(0 && !vo_osd_changed_flag && vf->dmpi->planes[0]==vf->priv->fb_ptr){
        // ok, enough to update the area inside the video, leave the black bands
        // untouched!
        if(x0<vf->priv->exp_x){

Needs to be refined a bit more before merging though

hnsteyding commented 4 years ago

From my point of view i could not reproduce crashes with mplayer r38192 anymore. Also with the patch above. I tested with the following options:

mplayer -noconfig all -nofontconfig -font mplayer\subfont.ttf -osdlevel 3 test.mp4
mplayer -noconfig all -nofontconfig -font mplayer\subfont.ttf -osdlevel 3 -vf expand=::::1 -vo md5sum test.mp4
mplayer -demuxer rawvideo -rawvideo w=640:h=480:format=yuy2 -vf expand=::::1 -vo null test.mp4
mplayer -v dvd:///E: -menu -osdlevel 3
mplayer -border-pos-x 0.9 -border-pos-y 0.2 -vo direct3d test.mp4
Fedyon commented 1 year ago

After all, it's free to continue using MMX in 32bit builds. Given the performance gain between MMX and SSE2 being pretty marginal (especially in nowadays normal playback scenario), I feel okay-ish to mark this closed at the current status as-is. At the same time I'm not particularly opposing to re-enable SSE2; it should work fine either way now.