hrydgard / ppsspp

A PSP emulator for Android, Windows, Mac and Linux, written in C++. Want to contribute? Join us on Discord at https://discord.gg/5NJB6dD or just send pull requests / issues. For discussion use the forums at forums.ppsspp.org.
https://www.ppsspp.org
Other
11.19k stars 2.17k forks source link

Impossible Mission: Game lags badly when using in-game pause menu or computer terminals (Windows 32-bit) #9448

Closed Rekrullurker closed 7 years ago

Rekrullurker commented 7 years ago

If you press the Start button to open the in-game pause menu, the game lags badly. The lag is even worse if you access any of the in-game computer terminals (if you're not familiar with the game; Stand in front of any computer terminal in any room and push Up). This happens regardless of whether you choose New, Classic or Merged game styles or which character you select in the New mode.

The problem has gotten worse with newer builds. v0.9.8 drops to about 17 FPS, v1.3 drops to around 11 FPS and v1.3-972 drops to 6 FPS.

I wasn't able to find any graphic or audio settings that would fix this, even frameskipping doesn't seem to help. It also happens using the Interpreter rather than JIT.

I also tried using the Direct3D 9 backend, but it didn't render the in-game menus properly in any version;

https://s17.postimg.org/k8ghd6nwf/ULES00764_00000.jpg

It was faster in v1.3 with no drop in speed, but in v1.3-972, it lags just as badly as OpenGL.

unknownbrackets commented 7 years ago

Have you tried "Disable slower effects" in the settings? Does this make it faster?

If D3D9 became slower at some point, it'd be great if you could check the versions on the buildbot to determine when.

This wouldn't take that many tries - just go by halves. Try a build with a numebr around ~480 (there may be multiple with the same number, just try any with a download.) If it is fast, try a higher build (i.e. ~720) and if it's slow, go lower (~240).

If you could try around 8-9 builds, you'd narrow it down to almost exactly what change caused D3D9 to also become slower. Then we could figure out why that change made it slower. It's probably the same reason OpenGL is slower, because D3D9 has slowly been getting closer to the features of our OpenGL backend. We probably just fixed something in the D3D9 backend to match OpenGL.

Unfortunately, I don't have the game, so I can "bisect" as described above.

-[Unknown]

LunaMoo commented 7 years ago

I think those regressions could be caused by any texture cache changes, as this is very hardcore case - reloads 512x512 texture for every letter that's being written in that menu. What's stupid that 512x512 texture has data only at the very top there's like 2 lines of font, everything under is just random/unrelated/trash data. One uncommon thing I noticed is the address of this texture itself ~ 088d1a40 seems pretty low for a texture in ram, but probably nothing wrong.

Very shortened log:

19:27:605 user_main    D[G3D]: GPUCommon.cpp:1118 Starting DL execution at 0891afac - stall = 0891b014
19:27:605 user_main    D[SCESCEGE]: HLE\sceGe.cpp:399 sceGeListUpdateStallAddr(dlid=889192483, stalladdr=4891b07c)
19:27:605 user_main    D[G3D]: GPUCommon.cpp:1118 Starting DL execution at 0891b014 - stall = 0891b07c
19:27:605 user_main    D[SCESCEGE]: HLE\sceGe.cpp:399 sceGeListUpdateStallAddr(dlid=889192483, stalladdr=4891b0e4)
19:27:605 user_main    D[G3D]: GPUCommon.cpp:1118 Starting DL execution at 0891b07c - stall = 0891b0e4
19:27:605 user_main    D[SCESCEGE]: HLE\sceGe.cpp:399 sceGeListUpdateStallAddr(dlid=889192483, stalladdr=4891b14c)
19:27:605 user_main    D[G3D]: GPUCommon.cpp:1118 Starting DL execution at 0891b0e4 - stall = 0891b14c
19:27:605 user_main    D[G3D]: Common\TextureCacheCommon.cpp:477 Texture different or overwritten, reloading at 088d1a40: hash fail
19:27:607 user_main    D[SCESCEGE]: HLE\sceGe.cpp:399 sceGeListUpdateStallAddr(dlid=889192483, stalladdr=4891b1b4)
19:27:607 user_main    D[G3D]: GPUCommon.cpp:1118 Starting DL execution at 0891b14c - stall = 0891b1b4
19:27:607 user_main    D[G3D]: Common\TextureCacheCommon.cpp:477 Texture different or overwritten, reloading at 088d1a40: hash fail
19:27:611 user_main    D[SCESCEGE]: HLE\sceGe.cpp:399 sceGeListUpdateStallAddr(dlid=889192483, stalladdr=4891b21c)
19:27:611 user_main    D[G3D]: GPUCommon.cpp:1118 Starting DL execution at 0891b1b4 - stall = 0891b21c
19:27:611 user_main    D[G3D]: Common\TextureCacheCommon.cpp:477 Texture different or overwritten, reloading at 088d1a40: hash fail
19:27:613 user_main    D[SCESCEGE]: HLE\sceGe.cpp:399 sceGeListUpdateStallAddr(dlid=889192483, stalladdr=4891b284)
19:27:613 user_main    D[G3D]: GPUCommon.cpp:1118 Starting DL execution at 0891b21c - stall = 0891b284
19:27:613 user_main    D[G3D]: Common\TextureCacheCommon.cpp:477 Texture different or overwritten, reloading at 088d1a40: hash fail
19:27:614 user_main    D[SCESCEGE]: HLE\sceGe.cpp:399 sceGeListUpdateStallAddr(dlid=889192483, stalladdr=4891b2ec)

Another thing visible in the log is how spammy sceGeListUpdateStallAddr is and this actually makes a really easy workaround as the game works fine by nopping that one spammy call making it very light, however there's no need for such nasty hacks here as we now have texture replacement and hashranges deals with that font texture in a much more proper way:

textures.ini inside memstick\PSP\TEXTURES\ULES00764

[options]
version = 1
hash = quick

[hashes]

[hashranges]
0x088d1a40,512,512 = 512,40

will cut that font texture to size of the actual font, making the game just as light in that menu as anywhere else.

hrydgard commented 7 years ago

Oh. Try master, I wonder.... https://github.com/hrydgard/ppsspp/pull/9449

The low address is probably just that the texture is included as constant data in the binary, like const uint8_t fonttexture[] = { .... } in the game...

LunaMoo commented 7 years ago

Oh yeah, it's faster now even without replacement(althrough not as fast). Guessing it's https://github.com/hrydgard/ppsspp/commit/fbee32829336385cea8f62e5c4b2063498c60f34 as I was just few versions off.

unknownbrackets commented 7 years ago

How is it accessing the texture? What sort of texture coords? Is it in throughmode?

We should be detecting that it's not accessing as high and hashing a smaller region.

-[Unknown]

LunaMoo commented 7 years ago

Small correction it is as fast now as with texture cut to font size, I didn't notice I had something heavy running when testing latest one;c. So it's running 60/60 starting from v1.3-981-gf6463ae, texture is still 512x512 when actual sprite being 512x40 meaning if I understand it should be detected as 512x64?

Texture:

Tex U scale 1.000000
Tex V scale 1.000000
Tex U offset    0.000000
Tex V offset    0.000000
Tex mapping mode    gen: tex coords, proj: pos
Tex shade srcs  s: 0, t: 0
Tex mode    swizzled, 1 levels
Tex format  8888
Tex filtering   min: linear, mipmap linear, mag: nearest
Tex wrapping    clamp s, clamp t
Tex level/bias  auto
Tex lod slope   0.000000
Tex func    replace, RGBA
Tex env color   000000
CLUT    089d1b80, w=0
CLUT format ABGR 8888 ind & ff
Texture L0 addr 088d1a40, w=512
Texture L1 addr 00000000, w=0
(...)
Texture L0 size 512x512
Texture L1 size 1x1
(...)

Settings:

Name    Value
Clear mode  0
Framebuffer 00044000, w=512
Framebuffer format  5551
Depthbuffer 00088000, w=512
Vertex type through, u16 texcoords, s16 positions
Offset addr 08000000
Vertex addr 0891b048
Index addr  08000000
Region  0,0 - 479,271
Scissor 0,0 - 479,271
Min Z   002710
Max Z   00c350
Viewport Scale  240.000000, -136.000000, -20000.000000
Viewport Offset 2048.000000, 2048.000000, 30000.000000
Offset  1808.000000x1912.000000
Cull mode   back (CCW) (disabled)
Color test  pass if (c & ffffff)  !=  (ff00ff & ffffff)
Alpha test  pass if (a & ff) > (00 & ff)
Stencil test    pass if (00 & 00) NEVER (a & 00) (disabled)
Stencil test op fail=KEEP, pass/depthfail=KEEP, pass=KEEP (disabled)
Depth test  pass if src >= dst
Alpha blend mode    add: src.a, 1.0 - src.a
Blend color A   000000
Blend color B   000000
Logic Op    clear (disabled)
Fog 1   0.000000 (disabled)
Fog 2   0.000000 (disabled)
Fog color   000000 (disabled)
RGB mask    000000
Stencil/alpha mask  000000
Morph Weight 0  0.000000
(...)
Patch division  001010
Patch primitive triangles
Patch facing    000000 (disabled)
Dither 0    001d0c (disabled)
Dither 1    00f3e2 (disabled)
Dither 2    000c1d (disabled)
Dither 3    00e2f3 (disabled)
Transfer src    00000000, w=0
Transfer src pos    0,0
Transfer dst    00000000, w=0
Transfer dst pos    0,0
Transfer size   0,0

so vertex type - through, u16 texcoords, s16 positions I guess that's what you're asking?

Edit: Checking the log, it still has 1 hash fail per frame, but that's definitely better than 1xx times per frame, guess the issue can be closed?

unknownbrackets commented 7 years ago

Well, I'm talking about this area:

        if (throughMode) {
            if (entry->maxSeenV == 0 && gstate_c.vertBounds.maxV > 0) {
                // Let's not hash less than 272, we might use more later and have to rehash.  272 is very common.
                entry->maxSeenV = std::max((u16)272, gstate_c.vertBounds.maxV);
            } else if (gstate_c.vertBounds.maxV > entry->maxSeenV) {
                // The max height changed, so we're better off hashing the entire thing.
                entry->maxSeenV = 512;
                entry->status |= TexCacheEntry::STATUS_FREE_CHANGE;
            }
        } else {

But I forgot that because of sprite sheets, we made this minimum 272 (because otherwise it would grow multiple times per frame.) So I guess the replacement thing would help by killing some hashing still. maybe this logic can be made smarter.

I suppose multithreading (if it works) would probably help too, since usually it helps games that update the stall a ton of times in short bursts.

-[Unknown]

LunaMoo commented 7 years ago

Where previously I saw single digit fps, now I had to underclock older cpu to 400mhz(lol) to even see fps dropping below 60 and even then it was running at 50+... MT does indeed speed that up, but I think with those recent changes, it probably wouldn't matter even on mobiles.

Saying so I'll close this, hopefully @Rekrullurker experience doesn't differ.

hrydgard commented 7 years ago

@LunaMoo A small hint when benchmarking changes like this - instead of underclocking your CPU, unthrottle the emulator by holding TAB :)

LunaMoo commented 7 years ago

Yeah about that - in my case it's about the same since it takes a click/hotkey to switch to a different profile and setting one of the lower clocks I saved earlier. Many people are into overclocking & overvolting, I have the exact opposite fetish.:X

unknownbrackets commented 7 years ago

Also turn off the unthrottle frameskip setting in the ini, or tab won't measure some GPU related things.

But underclocking is probably a good way too. Probably doesn't underclock your GPU though? And, also note that older CPUs have different performance characteristics (higher latency instructions etc.)

-[Unknown]