xemu-project / xemu

Original Xbox Emulator for Windows, macOS, and Linux (Active Development)
https://xemu.app
Other
2.75k stars 275 forks source link

Z24S8 surface upload/download slow on mesa radeonsi #278

Open CallumDev opened 3 years ago

CallumDev commented 3 years ago

Profiling Azurik: Rise of Perathia under Linux with an AMD GPU shows a very large amount of time spent in the function _mesa_texstore_s8_z24, called when a Z24S8 surface is updated with glTexImage2D/glTexSubImage2D.

AMD GPUs don't support Z24S8 formats directly in HW so the OpenGL driver goes mad on converting them, this is causing slowdowns to 1-2 fps on my Ryzen 7 3700U.

Possible solutions:

1) Be more aggressive in trying to skip Depth/Stencil download and uploads. 2) Use some form of shader to convert Z24S8 to a hw-supported format instead of letting the gl driver convert the format on CPU

OR

3) Copy a different format to RAM and hope for the best (probably won't work)

Triticum0 commented 3 years ago

Other games affected

CallumDev commented 3 years ago

Notes on the possible shader solution:

CPU->GPU, upload as 32-bit uint texture then render to FBO with depthstencil attachment.

Writing stencil values: https://www.khronos.org/registry/OpenGL/extensions/AMD/AMD_shader_stencil_export.txt This extension is supported on mesa. Would need to add codepath to disable this shader if not supported. Depth values write to gl_FragDepth

GPU->CPU, need to render to 32-bit uint texture then download.

Use depth texture + stencil view tex (seems to require OpenGL 4.4): https://stackoverflow.com/questions/27535727/opengl-create-a-depth-stencil-texture-for-reading

mborgerson commented 3 years ago

@CallumDev This isn't only the case on AMD hardware; this storage format conversion will also happen with Nvidia, etc and is also very expensive, it definitely needs to be made faster because in some cases synchronization cannot be avoided. It's been on my radar for a while; but if you would like to explore and work on this you are welcome to, or I will get to it eventually.

CallumDev commented 3 years ago

More notes:

I've had success uploading stencil data by setting the GL state so the func is GL_REPLACE, GL_REPLACE, GL_REPLACE, disabling color mask and using this shader with a full screen quad to sample from a R32UI tex (min and mag filters must also be set to GL_NEAREST). gl_FragDepth is untested, that's just a guess at this point. As for integrating into xemu, I'm concerned about trampling all the GL state inadvertently.

GL_ARB_shader_stencil_export is required to support this as well which only seems to be supported on AMD and Intel - not nVidia. However with reports of Azurik running ok on nVidia, perhaps the performance hit is much less for them?

#version 440
#extension GL_ARB_shader_stencil_export : require

in vec2 uv;
uniform usampler2D depthstencil_tex;

void main(void) {
    uint sval = texture(depthstencil_tex, uv).r;
    gl_FragStencilRefARB = int(sval & 0xFFu);
    gl_FragDepth = float(sval >> 8) / 16777215.0;
}

Using the shader to upload avoids the huge FPS drop (6 fps for one 1024x768 surface in my test case).

HadetTheUndying commented 2 years ago

Also effects: https://xemu.app/titles/43430002/#Steel-Battalion https://xemu.app/titles/43430009/#Steel-Battalion-Line-of-Contact https://xemu.app/titles/41560009/#Rally-Fusion-Race-of-Champions

EDIT: As for the Nvidia comment, I can confirm that while there is definitely a negative performance impact on Nvidia it's not nearly as bad on Nvidia, but still bad. Azurik, Steel Battalion, and Rally Fusion run worse on AMD than Nvidia, but still are not running nearly as well as hardware on either AMD or Nvidia, on either Windows or Linux

mborgerson commented 2 years ago

In the clear case, we can be smarter about not uploading if we are about to do a full surface clear

ghost commented 2 years ago

Issue present in: https://xemu.app/titles/45410042/#007-Everything-or-Nothing

CallumDev commented 2 years ago

This issue may be resolved by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18484?commit_id=4da147a02b541311e8dc231b30dd36fafea820ff

TODO: Test when this makes it into a stable mesa release (22.3)

dekay commented 2 years ago

Wonder if this might also help #777 ?

HadetTheUndying commented 2 years ago

This issue may be resolved by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18484?commit_id=4da147a02b541311e8dc231b30dd36fafea820ff

TODO: Test when this makes it into a stable mesa release (22.3)

I suppose one of us could try building with this commit and see if it resolves it. I might mess with it Monday. It would definitely be great if it does because the number of games effected by this issue has increased over time.

CallumDev commented 2 years ago

image

I can confirm wtih 22.3-devel that the performance of at least Azurik: Rise of Perathia is much improved (previously it was <1fps). In less complex areas it will hit 30fps.

CPU: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
OS_Version: Fedora Linux 36 (Workstation Edition)
GL_VENDOR: Intel
GL_RENDERER: Mesa Intel(R) Xe Graphics (TGL GT2)
GL_VERSION: 4.6 (Core Profile) Mesa 22.3.0-devel
GL_SHADING_LANGUAGE_VERSION: 4.60
gamrXerus commented 2 years ago

On Mesa 22.3.0-devel, Halo 2 splitscreen is no longer 1 fps on my machine. However, the graphics are messed up, so split screen is not playable as yet. See #1237 halo2bug halo2bug1 halo2bug2

resadent commented 1 year ago

Built mesa 22.0.5 with "speed up glTexImage" patch on ubuntu 22.04. I tried Panzer Dragoon Orta, which before ran below 10fps, and now it is almost locked 60fps in my xeon e5-1270 v3 system. Very impressive stuff.

dekay commented 1 year ago

I just tested Panzer Dragoon Orta with mesa 22.3.0-1 on arch and the speedup is crazy good, just as reported by @resadent. I suggest closing this and #777 as well.

HadetTheUndying commented 1 year ago

This shouldn't be closed until mesa-22.3.1 is in release since that's considered the first stable release.

EDIT: It doesn't fix intel or nvidia's issues either.

I just tested Panzer Dragoon Orta with mesa 22.3.0-1 on arch and the speedup is crazy good, just as reported by @resadent. I suggest closing this and #777 as well.

HadetTheUndying commented 1 year ago

Issue is still present in Steel Battalion: Line of Contact even with the latest mesa-22.3.2. I think this issue should remain open until all effected games are confirmed working. It's possible the issue with LoC is no longer related to this issue but i don't have time to profile it right now.

EDIT: Also as of now Jan 21st with the lastest Mesa HEAD the issue is still preset so and Panzer Dragoon does still have major slowdowns dropping all the way down to 7FPS.