Open RageXbox opened 2 years ago
From my experience this affects all games using XMV video format. Games like GTAIII/VC which use Bink Video (.BIK) are fine.
Just adding some notes here, because I'm seeing this too.
My assumption is that XMV decoding is just a software library linked into the game, and complex/higher-res videos are CPU-bound, and thus is at the mercy of qemu's efficiency.
(unless, I suppose, the texture upload performance can be improved, but I haven't dug into that at all.)
As a test, I ran xemu under Linux's "perf record" tool. I used "Castlevania - Curse of Darkness", since it has a really long XMV near startup that doesn't keep up, and just raced through the menus to start the game without a save created...then I just let perf collect samples while the video didn't keep up for a minute or two.
These are the biggest CPU hotspots:
+ 11.99% 0.00% xemu [unknown] [.] 0x0000000183e5f000
+ 10.34% 0.22% xemu [unknown] [.] 0000000000000000
+ 7.58% 0.47% xemu libc.so.6 [.] clock_gettime@@GLIBC_2.17
+ 4.43% 0.00% xemu [vdso] [.] 0x00007ffd3c1ab6e8
+ 4.40% 4.40% xemu [vdso] [.] 0x00000000000006e5
+ 3.66% 3.50% xemu xemu [.] helper_cvtps2pi
3.51% 3.38% xemu xemu [.] helper_psrad_mmx
2.92% 2.79% xemu xemu [.] float32_add
+ 2.89% 2.71% xemu xemu [.] float32_mul
2.83% 2.81% xemu xemu [.] helper_packuswb_mmx
+ 2.79% 1.43% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0ee01
2.41% 2.28% xemu xemu [.] helper_punpcklwd_mmx
+ 2.38% 0.01% xemu [unknown] [k] 0xffffffffb160007c
+ 2.34% 0.00% xemu [unknown] [.] 0x0000000000000190
+ 2.33% 0.00% xemu [unknown] [.] 0x70632d363833692d
+ 2.33% 0.00% xemu [unknown] [.] 0x00007ff0f40c7ab0
+ 2.29% 2.25% xemu xemu [.] cpu_exec
+ 2.27% 0.01% xemu [unknown] [k] 0xffffffffb142c331
2.11% 2.03% xemu xemu [.] helper_psllq_mmx
+ 1.75% 1.62% xemu xemu [.] helper_lookup_tb_ptr
+ 1.74% 0.00% xemu xemu [.] 0x00005618e0bf77f0
+ 1.73% 1.65% xemu xemu [.] soft_f32_mul
+ 1.72% 1.48% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0edf3
+ 1.55% 0.07% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0edf6
1.50% 1.37% xemu xemu [.] helper_packssdw_mmx
1.47% 1.42% xemu xemu [.] helper_pslld_mmx
+ 1.45% 0.01% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0ee12
+ 1.44% 0.00% xemu [unknown] [.] 0x00000000000000e8
+ 1.44% 1.44% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0ee0e
+ 1.44% 0.00% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0ee05
+ 1.44% 0.00% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0edd0
+ 1.43% 1.43% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0edce
+ 1.43% 1.42% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0ede7
+ 1.43% 0.00% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0eded
+ 1.43% 0.00% xemu [unknown] [.] 0x00007ff0f40c7a00
+ 1.43% 1.40% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0edda
+ 1.43% 0.01% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0edc0
+ 1.42% 1.42% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0ee3b
+ 1.41% 0.00% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0eddd
+ 1.37% 1.36% xemu [JIT] tid 2630951 [.] 0x00007ff0b4d0edfd
+ 1.12% 0.00% xemu [unknown] [.] 0x4781270047812700
+ 1.12% 0.00% xemu [unknown] [.] 0xc6483000c6483000
+ 1.06% 0.00% xemu libc.so.6 [.] __GI___ioctl_time64
1.04% 0.93% xemu xemu [.] helper_mulps
1.02% 0.96% xemu xemu [.] helper_paddw_mmx
+ 1.01% 0.91% xemu xemu [.] parts64_round_to_int_normal.constprop.0
+ 1.00% 0.00% xemu [unknown] [.] 0xffffffffb0b3e981
(everything below this is < 1 percent of the CPU time.)
The things without symbols are probably JIT'd code from qemu, which would make sense if this is just chewing through video data on the CPU.
The most notable thing here is the clock_gettime() call taking seven percent of the processing time. That is probably worth exploring! If this is just the loop in sdl2_gl_refresh, though, it means we're not spending much time on the CPU per-frame and it's just spinning to keep the rendering at 60Hz, and the CPU emulation is not the problem in this case because we're clearly waiting around with nothing to do.
After that, it's worth looking at helper_cvtps2pi, float32_add, etc, and seeing if there's something that will make them faster. Likely small tweaks in there will result in significant improvements due to the likely sheer volume of these function calls per-frame. Is there some way we can inline these functions (or coerce the JIT to inline them instead of calling them as functions?). Or get them to overlay actual SIMD instructions if they are faking them without it?
Also I'm curious if the decoding library was intended to use the GPU to aid in decoding and is falling back to software decoding because something isn't hooked up, but this era of gaming might not have been that savvy about GPU processing (or not had enough GPU to do it), but maybe it turns out there's some basic check that is unexpectedly failing and putting us on a slow path that actual consoles never used. Don't know.
Failing all else: maybe we detect this library and use HLE to handle decoding in native code without going through qemu...? Seems like a big lift and a lot of trouble, but I guess it's a possible solution.
Anyhow, I have no answers or patches at the moment, just adding information and ideas to this bug report.
This issue Is horrible for unskippable cutscenes
Bug Description
Most games with FMVs lags and FMV audio lags too. Even if you’re on a most powerful hardware. Causes Halo 2 Epilogue to skip.
Expected Behavior
FMVs should not lag and audio should not lag at all.
xemu Version
v0.6.2-9-g69ceec4446
System Information
Windows 10 (64-bit) (Intel(R) Xeon(R) CPU E5-2678 v3 @2.50 GHz) (NVIDIA Quadro P5000) (NVIDIA 472.47)
Additional Context
https://user-images.githubusercontent.com/76413188/143661249-f11a1cb2-ee12-4170-8e87-06770a21639a.mp4