Open Nabokov86 opened 2 years ago
Try tracking down the last working dev build from https://buildbot.orphis.net/ppsspp/index.php?m=fulllist So the devs can find out which commit causing it.
Does an early version of 1.12 also have this issue?
That is, uh, somewhat unexpected! Weird.
Hm. What that means is it called sceKernelMemcpy a lot of times, so no single call was slower than 14.25 ms.
I wonder if this is somehow from the memory tagging... would be unfortunate. If yes, hopefully we can find a way to fix. Given the version range, that's the first suspect that pops out at me... it also invalidates icache, though.
-[Unknown]
Memory tagging for the Memory Viewer? May be we should have an option to enable it for debugging purpose only, so it won't affects common players, since common players won't be using debugging features anyway.
Well, I really want it in save states even from someone who wasn't debugging. It's also much easier for someone to get involved with debugging if there aren't secret hidden settings they have to enable to make things work well. Already hard to get into it, don't want it harder.
In this case, my suspicion is there's just some unfriendly use case (if it's that) which can be optimized. The storing of tags is pretty quick, and it already skips fine-grained info by default.
Anyway, it's just a guess so we shouldn't dig to far into it without better evidence.
-[Unknown]
UPDATE
The latest working version from Google Play is 1.11.3. ~~The issue occurs with v1.11.2-85-g19bd943ad and above. v1.11.2-80-g2de6b359c works fine.~~
But there may be a several issues between these versions. Because the speed on v1.11.2-85-g19bd943ad is about 65%. While on the release v.12.3 about 40%.
It feels like some other change above v1.11.2-85 also affects the speed.
https://github.com/hrydgard/ppsspp/compare/v1.11.2-80-g2de6b359c...v1.11.2-85-g19bd943ad
This changed the unthrottling mode (for fast forward), which shouldn't matter, and also the audio locking behavior... but only in menus. Neither should specifically be able to affect sceKernelMemcpy. As long as you're not fast forwarding, neither should even affect gameplay speed at all?
Just to be sure, if you manually edit UnthrottlingMode in ppsspp.ini, and set it to CONTINUOUS for example, does that change the speed at all (in the latest version is fine)?
-[Unknown]
@unknownbrackets Just checked, no difference.
I don't know if it's important. But now sceKernelMemcpy is not 253ms. The speed is the same.
It has absolutely nothing to do with UnthrottlingMode. Just noticed.
Between v1.11.2-80-g2de6b359c
to v1.11.2-85-g19bd943ad
there is a commit related to Memcpy isn't? Debugger: Notate Memcpys directly as well.
Huh? The commit with that message is e7b968be7, from #14056. It was v1.11.2-80-ge7b968be7, but it's from a different history - it was merged into master in 2f3bc2d37, or v1.11.2-180-g2f3bc2d37. It's just that the initial commit was on top of an older master.
v1.11.2-80-g2de6b359c and v1.11.2-85-g19bd943ad both do not contain e7b968be7.
-[Unknown]
Has anything improved in the latest git builds?
-[Unknown]
@unknownbrackets No, I don't see any difference.
This is kinda suspicious:
Huge amount of draw calls, we do manage to squeeze it down to 500-ish but I'm suspecting this is one of those tilemap engines that use the PSP GPU in some wacky unfriendly way. It did work faster before which means it might be some texture flushing things that changed, for example?
Although that sceKernelMemcpy thing is weird too of course, but since it was before the new tracking .. or it might be now invalidating some texture it didn't before?
A GE frame dump could help.
I don't know, I think it's likely that the reported commit range is not quite right, and it indeed has something to do with the memory tracking. If the game, for whatever reason, is doing huge numbers of very small sceKernelMemcpy for example. We don't manage to avoid allocations on that path.
Thinking of doing a specialized version of NotifyMemInfo for copies, that does any string operations only once the size and the detail flag has been checked.
Well, we could just wrap the last 3 lines in if (size >= 0x100 || MemBlockInfoDetailed()) {
, I suppose. I'll add something.
-[Unknown]
In addition to a frame dump, it would also help to see a debug stats screenshot from before the problem - when it's fast.
-[Unknown]
@Nabokov86 Please try build 1.12.3-883 (or later) from https://buildbot.orphis.net/ppsspp/index.php?m=fulllist and check if it helps.
I apologize, I didn't test well enough before reporting. The issue did not start from v1.11.2-85-g19bd943ad. This is my mistake.
Also, I didn't notice at all that the game was running slow in general. @hrydgard mentioned the amount of draw calls, maybe that's the case.
1.12.3-883 (or later)
Yes, the issue with sceKernelMemcpy is fixed! Now the game is just running as slow as before. Much better :rocket:
It looks like this now:
GE frame dump: NPJH50698_0001.zip
Cool, thanks for testing and for the frame dump! Will be very useful for looking into the remaining performance issues.
This draws the screen via 8x8 rectangles in a grid. It changes the CLUT offset relatively often between these squares, and the CLUT is clearly organized as a series of about 12 16-color palettes, each including transparent. So it's using the offset to palette swap and alternate between them.
It does this as layers. The first 2135 prims (until 2136, prim 1 is a clear) are spent drawing the first layer, and then it continues drawing layer 2, which is another 2135 prims. Then some sprites, which seems to include the sail and waves at the bottom of the ship. At 4464, we're drawing another layer of 8x8s, which is the rain. Then another layer of sprites (raindrop splashes), and the dialog box.
Especially unfortunate is that for layer 2, most of the squares are texturing from an entirely transparent square and drawing actually nothing (it's clearly simulating tile based drawing, tile 0 is transparent.) Layer 1 is also drawn with blending but doesn't need to be.
It's possible retain changed textures may help this game's performance.
-[Unknown]
The most memory-efficient way to render this would probably be to do depal-in-shader for these tilemap textures (we only support that for framebuffer textures currently)... If we passed some extra data per draw might be possible to reduce the flushes a lot, though would require quite a bit of plumbing. As you say, retain changed texture might let us keep around the texture depal'd with each of the palettes though.
Running smooth on my phone with mali g-52 gpu vulkan backend.
Updated title since it's no longer any slower than it's ever been, really. It's just still not terribly fast.
I wonder if software might actually perform better here (with jit), currently. Obviously, adding the clut param as a vertex attribute on hardware ought to be even better. I wonder if there are many other games that would help...
-[Unknown]
Game or games this happens in
NPJH50698 - "Fushigi no Dungeon - Fuurai no Shiren 4 Plus"
What area of the game
From the very beginning of the game.
Speed seen in PPSSPP
39.5% (3/60)
On v1.11.3 speed is above 200%
GE frame capture and debug statistics
Platform
Android
Mobile phone model or graphics card
Tested on TV box running on Arm Mali-G31 GPU, on android 9.
PPSSPP version affected
v1.12.3 from google play v1.12.3-491-gcc767622d ....
v1.11.2-85-g19bd943adLast working version
v1.11.3 from google play
v1.11.2-80-g2de6b359cGraphics backend (3D API)
Vulkan and OpenGL
Checklist