Closed glebm closed 8 months ago
I think that it is using the -O2
flags already. Not sure from where they are being generated, but look at the generated Makefile
///
CFLAGS = -g -O2 -G0 -Wall
///
CXXFLAGS = -g -O2
///
PSPSDK_CFLAGS = -g -O2 -G0 -Wall
PSPSDK_CXXFLAGS = -g -O2 -G0 -Wall -fno-exceptions -fno-rtti
///
I see, makes sense.
It'd be good to add -fipa-pta
to these flags because this optimization is not enabled at any level but can have significant (2-5%) impact on low-end systems (at the cost of build time but that's fine for building the Docker image).
I see, makes sense.
It'd be good to add
-fipa-pta
to these flags because this optimization is not enabled at any level but can have significant (2-5%) impact on low-end systems (at the cost of build time but that's fine for building the Docker image).
Cool for sure! If it brings performance, I'm always super happy.
@davidgfnet do you agree as well?
What kind of performance issues you guys seeing? I have the feeling that the SDK might not be the bottleneck, given it just calls to other OS syscall most of the time (so there's limited code/logic). I do not oppose to adding more flags like the one suggested. In fact we/you might wanna try LTO for some extra performance, particularly since binaries are more or less static (hence a lot of room for LTO optimization).
For sure it's not the bottleneck yeah. We just had a similar issue with nxdk where the lack of SDK optimization was indeed the issue but since this already has -O2
the SDK is not the bottleneck.
We're seeing 7 FPS in the DevilutionX (Diablo 1) WIP PSP port https://github.com/diasurgical/devilutionX/pull/5869
7 fps for full adapted view. At most 15 fps for a re-croped version with custom UI.
We like to hit 20+ fps before calling it a working port. The hunch is that most of the time is spend in the software 2D rendere, but we don't have developers with hardware so it's hard to accurately gauge where exactly the time is being spent.
Writing a custom PSP rendere for it might help, but that is a lot of work as well, so magically finding a 33-300% performance boost would be preferred...
I'm pretty sure the performance bottleneck is in SDL2. The way the renderer in SDL2 deals with rendering texture which are made by writing each pixel to the texture individually is not optimized.
What needs to happen for this to work better is for the texture to be store in vram and I think it should not be swizzled. Do keep in mind that the PSP only has 2 mb of vram. Let me know if you need more info. There is another port available from @fjtrujy which might perform a bit better.
DevilutionX internally renders at 640x480x8 (8-bit indexed) to a software surface. We then render that software surface to textures, and then to screen (hoping that downscaling is done in hardware at that point).
640x480 is above PSP max texture size (which is 512x512), so we render it to two 512x480x16 textures (ABGR1555, left half and right half) and then blit those to screen.
That comes down to < 1 MiB texture size:
(2 2 512 * 480) / 1024.0 / 1024.0 => 0.9375
We create the textures like this:
SDL_CreateTexture(renderer, SDL_PIXELFORMAT_ABGR1555, SDL_TEXTUREACCESS_STREAMING, 512, 480))
Are we creating them wrong, i.e. does this not result in vram-stored textures?
Regarding swizzled textures, we don't do anything special for that. Are these textures swizzled currently?
The issue is that using streaming textures is really slow on PSP with SDL2 currently. Streaming textures are not swizzled and they are stored in ram. They are copied into the screen buffer in vram after that and then rendered. This means that a GBC emulator There might also be some artifacts, since there are some bugs in the streaming texture implementation too.
I'm really sorry it is like that. We have not had someone with the knowledge to address this take a go at this in a while. I do think it's fixable, though.
If you are internally rendering to a software buffer (relatively big! 640x480) it is likely that the CPU just can't keep up. Just back of the envelope math, that's 9M pixels (at 30fps), which means around (best case) ~37 instructions per pixel. This is quite tight on a CPU like the Allegrex. Just for reference the GBA emulator (gpsp) renders around 2.3M pixels per second (240x160 @ 60fps) and consumes around 50% of its run time just in pixel rendering (ie. blitting tiles and sprites), for a platform that only has 3-4 rendering layers). Any change to the rendering code that results in a couple more instructions and the performance already drops 5-10% easily. I think you are just giving the poor device too much to chew on :P I would start by severely cutting down on pixels. Given that half of the bottom screen is just some "UI widget" that barely changes, I assume that using 3 or 4 textures (and letting the GPU do the overlay for you) could greatly help. You would only render 50% of the area and save on data moves (the PSP bus is quite slow, even when using the fastest most-optimized copying techniques).
Going back to the Issue itself, I do not see any problem with the SDK build, you won't squeeze any perf out of it for your project. Feel free to chime in the PSP discord, you might find some knowledgeable people that can steer you in the right direction for your port :)
Thanks that's a good way to think about it. Testing seams to indicate that the PSP is only able to push about 3MP in the way we do things.
The game runs at 480p and target FPS is 20, so in full screen that would be 853x480x20 = 8MP, not that it's a lot better.
If we did a custom UI we can cut the game down to 640x360 (4.6MP).
At width <= 640 the UI is not re rendered, so most the saving is already done there (except the copying).
Hello guys, I will convert this now into discussion, feel free to continue!
We've noticed rather slow performance of the WIP DevilutionX PSP port and I wondered if the SDK is actually built in optimized mode (e.g.
-O2
) in the docker image.Looking at the scripts:
configure.ac
defaults to unoptimized mode: https://github.com/pspdev/pspsdk/blob/713aef7c52e1835b79f4d5c9e23a1ab470092dcc/configure.ac#L72-L74When we fixed a similar issue for another homebrew SDK (nxdk for the original xbox), we saw DevilutionX FPS almost double. If this is indeed an issue, it'd be great if you could fix it! Thanks!
/cc @AJenbo @fjtrujy