hrydgard / ppsspp

A PSP emulator for Android, Windows, Mac and Linux, written in C++. Want to contribute? Join us on Discord at https://discord.gg/5NJB6dD or just send pull requests / issues. For discussion use the forums at forums.ppsspp.org.
https://www.ppsspp.org
Other
11.42k stars 2.19k forks source link

Fushigi no Dungeon tile rendering is slow #15251

Open Nabokov86 opened 2 years ago

Nabokov86 commented 2 years ago

Game or games this happens in

NPJH50698 - "Fushigi no Dungeon - Fuurai no Shiren 4 Plus"

What area of the game

From the very beginning of the game.

Speed seen in PPSSPP

39.5% (3/60)

On v1.11.3 speed is above 200%

GE frame capture and debug statistics

Screenshot_20211225-213008

Platform

Android

Mobile phone model or graphics card

Tested on TV box running on Arm Mali-G31 GPU, on android 9.

PPSSPP version affected

v1.12.3 from google play v1.12.3-491-gcc767622d .... v1.11.2-85-g19bd943ad

Last working version

v1.11.3 from google play v1.11.2-80-g2de6b359c

Graphics backend (3D API)

Vulkan and OpenGL

Checklist

anr2me commented 2 years ago

Try tracking down the last working dev build from https://buildbot.orphis.net/ppsspp/index.php?m=fulllist So the devs can find out which commit causing it.

Does an early version of 1.12 also have this issue?

hrydgard commented 2 years ago

image

That is, uh, somewhat unexpected! Weird.

unknownbrackets commented 2 years ago

Hm. What that means is it called sceKernelMemcpy a lot of times, so no single call was slower than 14.25 ms.

I wonder if this is somehow from the memory tagging... would be unfortunate. If yes, hopefully we can find a way to fix. Given the version range, that's the first suspect that pops out at me... it also invalidates icache, though.

-[Unknown]

anr2me commented 2 years ago

Memory tagging for the Memory Viewer? May be we should have an option to enable it for debugging purpose only, so it won't affects common players, since common players won't be using debugging features anyway.

unknownbrackets commented 2 years ago

Well, I really want it in save states even from someone who wasn't debugging. It's also much easier for someone to get involved with debugging if there aren't secret hidden settings they have to enable to make things work well. Already hard to get into it, don't want it harder.

In this case, my suspicion is there's just some unfriendly use case (if it's that) which can be optimized. The storing of tags is pretty quick, and it already skips fine-grained info by default.

Anyway, it's just a guess so we shouldn't dig to far into it without better evidence.

-[Unknown]

Nabokov86 commented 2 years ago

UPDATE

The latest working version from Google Play is 1.11.3. ~~The issue occurs with v1.11.2-85-g19bd943ad and above. v1.11.2-80-g2de6b359c works fine.~~

Nabokov86 commented 2 years ago

But there may be a several issues between these versions. Because the speed on v1.11.2-85-g19bd943ad is about 65%. While on the release v.12.3 about 40%.

It feels like some other change above v1.11.2-85 also affects the speed.

unknownbrackets commented 2 years ago

https://github.com/hrydgard/ppsspp/compare/v1.11.2-80-g2de6b359c...v1.11.2-85-g19bd943ad

This changed the unthrottling mode (for fast forward), which shouldn't matter, and also the audio locking behavior... but only in menus. Neither should specifically be able to affect sceKernelMemcpy. As long as you're not fast forwarding, neither should even affect gameplay speed at all?

Just to be sure, if you manually edit UnthrottlingMode in ppsspp.ini, and set it to CONTINUOUS for example, does that change the speed at all (in the latest version is fine)?

-[Unknown]

Nabokov86 commented 2 years ago

@unknownbrackets Just checked, no difference.

Nabokov86 commented 2 years ago

I don't know if it's important. But now sceKernelMemcpy is not 253ms. The speed is the same.

It has absolutely nothing to do with UnthrottlingMode. Just noticed. Screenshot_20211226-175826_1

anr2me commented 2 years ago

Between v1.11.2-80-g2de6b359c to v1.11.2-85-g19bd943ad there is a commit related to Memcpy isn't? Debugger: Notate Memcpys directly as well.

unknownbrackets commented 2 years ago

Huh? The commit with that message is e7b968be7, from #14056. It was v1.11.2-80-ge7b968be7, but it's from a different history - it was merged into master in 2f3bc2d37, or v1.11.2-180-g2f3bc2d37. It's just that the initial commit was on top of an older master.

v1.11.2-80-g2de6b359c and v1.11.2-85-g19bd943ad both do not contain e7b968be7.

-[Unknown]

unknownbrackets commented 2 years ago

Has anything improved in the latest git builds?

-[Unknown]

Nabokov86 commented 2 years ago

@unknownbrackets No, I don't see any difference.

hrydgard commented 2 years ago

This is kinda suspicious:

image

Huge amount of draw calls, we do manage to squeeze it down to 500-ish but I'm suspecting this is one of those tilemap engines that use the PSP GPU in some wacky unfriendly way. It did work faster before which means it might be some texture flushing things that changed, for example?

Although that sceKernelMemcpy thing is weird too of course, but since it was before the new tracking .. or it might be now invalidating some texture it didn't before?

A GE frame dump could help.

hrydgard commented 2 years ago

I don't know, I think it's likely that the reported commit range is not quite right, and it indeed has something to do with the memory tracking. If the game, for whatever reason, is doing huge numbers of very small sceKernelMemcpy for example. We don't manage to avoid allocations on that path.

Thinking of doing a specialized version of NotifyMemInfo for copies, that does any string operations only once the size and the detail flag has been checked.

unknownbrackets commented 2 years ago

Well, we could just wrap the last 3 lines in if (size >= 0x100 || MemBlockInfoDetailed()) {, I suppose. I'll add something.

-[Unknown]

unknownbrackets commented 2 years ago

In addition to a frame dump, it would also help to see a debug stats screenshot from before the problem - when it's fast.

-[Unknown]

hrydgard commented 2 years ago

@Nabokov86 Please try build 1.12.3-883 (or later) from https://buildbot.orphis.net/ppsspp/index.php?m=fulllist and check if it helps.

Nabokov86 commented 2 years ago

I apologize, I didn't test well enough before reporting. The issue did not start from v1.11.2-85-g19bd943ad. This is my mistake.

Also, I didn't notice at all that the game was running slow in general. @hrydgard mentioned the amount of draw calls, maybe that's the case.

Nabokov86 commented 2 years ago

1.12.3-883 (or later)

Yes, the issue with sceKernelMemcpy is fixed! Now the game is just running as slow as before. Much better :rocket:

Nabokov86 commented 2 years ago

It looks like this now: NPJH50698_00008

GE frame dump: NPJH50698_0001.zip

hrydgard commented 2 years ago

Cool, thanks for testing and for the frame dump! Will be very useful for looking into the remaining performance issues.

unknownbrackets commented 2 years ago

This draws the screen via 8x8 rectangles in a grid. It changes the CLUT offset relatively often between these squares, and the CLUT is clearly organized as a series of about 12 16-color palettes, each including transparent. So it's using the offset to palette swap and alternate between them.

It does this as layers. The first 2135 prims (until 2136, prim 1 is a clear) are spent drawing the first layer, and then it continues drawing layer 2, which is another 2135 prims. Then some sprites, which seems to include the sail and waves at the bottom of the ship. At 4464, we're drawing another layer of 8x8s, which is the rain. Then another layer of sprites (raindrop splashes), and the dialog box.

Especially unfortunate is that for layer 2, most of the squares are texturing from an entirely transparent square and drawing actually nothing (it's clearly simulating tile based drawing, tile 0 is transparent.) Layer 1 is also drawn with blending but doesn't need to be.

It's possible retain changed textures may help this game's performance.

-[Unknown]

hrydgard commented 2 years ago

The most memory-efficient way to render this would probably be to do depal-in-shader for these tilemap textures (we only support that for framebuffer textures currently)... If we passed some extra data per draw might be possible to reduce the flushes a lot, though would require quite a bit of plumbing. As you say, retain changed texture might let us keep around the texture depal'd with each of the palettes though.

ghost commented 2 years ago

Running smooth on my phone with mali g-52 gpu vulkan backend.

https://user-images.githubusercontent.com/37603562/163662770-48aeff86-d1ad-4bb8-abb5-bb58d7be8a3f.mp4

unknownbrackets commented 2 years ago

Updated title since it's no longer any slower than it's ever been, really. It's just still not terribly fast.

I wonder if software might actually perform better here (with jit), currently. Obviously, adding the clut param as a vertex attribute on hardware ought to be even better. I wonder if there are many other games that would help...

-[Unknown]