hrydgard / ppsspp

A PSP emulator for Android, Windows, Mac and Linux, written in C++. Want to contribute? Join us on Discord at https://discord.gg/5NJB6dD or just send pull requests / issues. For discussion use the forums at forums.ppsspp.org.
https://www.ppsspp.org
Other
11.09k stars 2.16k forks source link

GPU timing estimation inaccurate #2010

Closed unknownbrackets closed 11 years ago

unknownbrackets commented 11 years ago

We know it's wrong and we're not sure how to fix it, but let's at least gather some data. #2009 adds statistics. Some games I know about:

Sword Art Online (runs slow even at 60 vps / fast forwarded):

Fat Princess (runs fine but can't fast forward well even on a powerful box):

God of War: Ghost of Sparta (#862) ???

God of War: Chains of Olympus (#862) ???

Grand Theft Auto (not sure which one?) ???

Nayuta no Kiseki (#1246) ???

Metal Gear Solid Portable Ops / Ops Plus (runs slow even at 60 vps / fast forwarded): ???

-[Unknown]

solarmystic commented 11 years ago

If I may add another two games to your list:-

Metal Gear Solid Portable Ops and Portable Ops Plus

(runs slow even at 60 VPS displayed, but feels perfect when turbo is fixed to 80 - 90 VPS)

Tekken 6

4.3 - 5 million cycles per frame 114.6 per vert Game is rendering every single frame (Numbers stay consistent)

Picture:-

capture

(requires frameskip to hit consistent 60 VPS even on the PC, granted I'm referring to my older Core 2 2.8 GHz box. Don't even start with mobile devices, none of them have the grunt to attain playable speeds for this game as of the current optimization)

thedax commented 11 years ago

Edit: Nevermind, I see how it works..

unknownbrackets commented 11 years ago

You need to go to Options -> Show Debug Statistics. It'll show on one of those lines... I recommend making sure you're at 2x or higher to be able to read it well.

-[Unknown]

nekotipcat commented 11 years ago

Dissida Final Fantasy (runs fine with 60 vps and 60 fps tested with fraps,but sometimes it slows down when displaying special attacks/special effects and sometimes without even doing something, without any changes to fps or vps) without buffered rendering around Cycles 1222340-1640342 Cycles per vert 59.9 - 63.1

Wow it goes crazy by doing buffering off and buffering on in Dissida, it's required to not get mirroing in battle. Picture

ppssppwindows 2013-05-31 22-17-42-28

vps is around 59.2 and 62.1 it only goes to 59.2 for like 0.10 sec

with buffered rendering off it's always 30 fps and no slowdowns so it seems to be fixed if it isn't something else

thedax commented 11 years ago

Pangya has similar "slow"(in quotes because it runs fine on my PC, but fast forward isn't that much of a boost) issues, though I don't know if that's related to all of the VPL calls or the GPU: -Fast forwarding only brings it up to 100 VPS on my 3.8ghz i7 Screen of the stats: http://i.imgur.com/HD4zoeu.jpg

To sum it up though:

unknownbrackets commented 11 years ago

If you see the stats jumping between 0 and a number over and over again, the game is internally frameskipping. Many games do this intentionally and render only 30 fps. If the numbers stay mostly constant, then the game is rendering something every single frame.

To see the every third frame thing in Fat Princess, I made it log the cycle count in addition to displaying the stats on screen. Otherwise it's hard to tell...

To test if this is affecting the game, the easiest way is to change this line in DisplayListInterpreter (it appears twice, only second one matters if you don't use frameskip): cyclesExecuted += vertexCost * count;

If you make it 0 and it goes faster, it means it's estimating too high (like Sword Art Online.) If you make it higher (* 2, * 10, * 100) and the game is faster without blinking black or bugs, then it's estimating too low (like Fat Princess.)

-[Unknown]

thedax commented 11 years ago

Changing Pangya to cyclesExecuted = 0 slows it down massively, by about 50% VPS. 2 & * 10 slow it down by about 50% VPS as well. 100 makes the game skip multiple frames, with 60 VPS.

So I guess its slowness issue is found elsewhere.

Ritori commented 11 years ago

I dont my computer system though then will post screenshot sorry bout this :( this stat Valkrian Chrocnicle 2 i think this game pretty slow opkiefksad

sometime change into this

asowafp

unknownbrackets commented 11 years ago

Valkyria Chronicles 2 runs fine for me, I can get 800 vps on turbo, and its speed seems to match the PSP perfectly. Voices sound smooth too.

Keep in mind that the game uses the digital (arrow keys) for adjustments, and the analog stick (ijkl, gamepad analog) for actual movement. Everything will be really slow if you try to use the digital, which is the same on the actual PSP.

-[Unknown]

Ritori commented 11 years ago

I see thank @unknownbrackets :) (maybe my laptor pretty slow )

unknownbrackets commented 11 years ago

Yeah, just to be clear, this isn't about games being generally slow, and most games don't have this problem. It's only specific ones in a pretty specific way.

The most important of these are ones that run at 60 vps, but the game "feels like" 30 or 45 fps. If using 90 or 120 as the fixed vps limit both a) works and b) makes the game appear to run at normal speed, then it most likely has this bug (but it could always be something else.)

There's also a case where some games render the same graphics more than once each frame. That's also this bug, and causes games to reach a much lower vps than they ought to. However, there are a lot of more likely performance issues, and the only way to verify this one is to:

This second one means we're estimating too low. It's not very common at all as far as I know, though. I think we are in general estimating too high except in a few circumstances.

-[Unknown]

daniel229 commented 11 years ago

with buffered rendering off

god of war: chains of olympus 6,797,426 cycles 68.916229 per vertex not slow,just fremes skip. god of war chains of olympus

god of war: ghost of sparta 6,929,632 cycles 72.052452 per vertex not slow,just fremes skip god of war ghost of sparta

gundam VS gundam next plus 6,636,438 cycles 168.805679 per vertex not slow,just fremes skip. gundam vs gundam next plus

Kamen Rider Climax Heroes 2,503,414 cycles 131.773041 per vertex slow,feel like need 90vps kamen rider climax heroes

nayuta no kiseki 6,188,464 cycles 132.027039 per vertex not slow,just frames skip and stuttering nayuta no kiseki

daniel229 commented 11 years ago

God Eater Burst 11,958,242 cycles 209.052994 per vertex not slow,just frames skip god eater burst

hrydgard commented 11 years ago

gta

solarmystic commented 11 years ago

Another game to add would be Ys Seven:-

https://github.com/hrydgard/ppsspp/issues/2156

Feels sluggish at times in certain open areas (particularly certain segments of the capital city) even though the VPS counter is showing 60 VPS all the time. Requires manual turbo set to higher than 60 VPS to play smoothly in those areas.

daniel229 commented 11 years ago

.Hack//Link. 761,802 cycles 38.115143 per vertex slow,seem like need 90VPS 04

unknownbrackets commented 11 years ago

Legend of Heroes: Trails in the Sky has this issue in some places: 3666294 / 50 per vertex

Normally it's < 3 million which is fine. The game already runs at 30 fps internally.

Savedata here: http://forums.ppsspp.org/showthread.php?tid=1346&pid=27810#pid27810

-[Unknown]

unknownbrackets commented 11 years ago

I did a quick change to measure flips per second. I'm not sure it's right:

https://github.com/unknownbrackets/ppsspp/compare/flips

A few examples:

I wonder if I'm doing it horribly wrong, but if I'm not, the Fat Princess and God of War demo numbers imply that we must be missing a wait or some sort of GE signal handling - there's no way it's all the cycle estimate.

-[Unknown]

hrydgard commented 11 years ago

Your change looks good to me, I'll merge it if you send a pull.

I agree that it's highly suspicious and we probably are missing something.

unknownbrackets commented 11 years ago

For Fat Princess, here's the log in the first prompt that gets 885 fps:

user_main:
1=sceCtrlPeekBufferPositive(09fff5d0, 1)  <--- peek never waits
7791850=sceKernelGetSystemTimeWide()
1=sceKernelSuspendDispatchThread()
sceKernelResumeDispatchThread(1)
7791853=sceKernelGetSystemTimeWide()

ge interrupt:
sceKernelSetEventFlag(289, 00000001)

user_main:
7791942=sceKernelGetSystemTimeWide()
7791956=sceKernelGetSystemTimeWide()
sceKernelWaitEventFlag(289, 00000022, 1, 00000000, 00000000) <-- OR
sceKernelClearEventFlag(289, ffffffdf)
sceKernelSetEventFlag(289, 00000010)

swap_thread:
sceKernelWaitEventFlag(289, 00000003, 33, 00000000, 00000000) <-- 0x21, OR/CLEAR
04000000 = sceGeEdramGetAddr
sceDisplaySetFramebuf(topaddr=04000000,linesize=512,pixelsize=3,sync=0)
sceGeListEnQueue(addr=08b23080, stall=00000000, cbid=00000000, param=09fdba90) <--- valid cbid, renders to 00088000
sceKernelSetEventFlag(289, 00000020)
sceKernelWaitEventFlag(289, 00000010, 32, 00000000, 00000000) <-- 0x20, AND/CLEAR

user_main:
<rinse and repeat, toggling framebufs>

So, most likely thing is that strange event flag thing (or maybe I'm just tired.) But God of War doesn't seem to call it:

user_main:
sceDisplaySetFramebuf(topaddr=0408c000,linesize=512,pixelsize=3,sync=0)
sceGeListEnQueue(addr=084a0000, stall=00000000, cbid=00000000, param=00000000) <-- valid cbid, renders to 04000000
1=sceCtrlPeekBufferPositive(09ffefb0, 1)
564078=sceKernelGetSystemTimeWide()
564079=sceKernelGetSystemTimeWide()
564080=sceKernelGetSystemTimeWide()
564081=sceKernelGetSystemTimeWide()
564082=sceKernelGetSystemTimeWide()
564086=sceKernelGetSystemTimeWide()
564381=sceKernelGetSystemTimeWide()
sceGeListSync(dlid=00000000, mode=00000000)

ge interrupt:
Ignoring interrupt for display list 0, already been released. <-- suspicious, maybe it should still work until the end of the list sync...

user_main:
<rinse and repeat, toggling framebufs>

So, there's a separate suspicious thing there too. What they have in common is they both check the time a lot, but they should know that very little time has passed...

Edit: no, it's logging wait as an int not %x, that's why. So Fat Princess doesn't really have anything suspicious... or, actually, how does sceKernelWaitEventFlag(289, 00000003, 33, 00000000, 00000000) not wait? Hmm... no no, clear is inverted...

-[Unknown]

solarmystic commented 11 years ago

This flips per second thingy you've devised seems to tell the true story previously obscured by the VPS measurements.

God of War demo: 1553 - 2223 in logos, 45 in menu, 47 - 150 in game.

The above in particular could explain why the logos are so damned slow, even at 1x RR for many devices and lower end computers like mine. The emulator is rendering the logos at such an insane number of flips per second.

Will definitely be useful to account for any other games with internal slowdown.

unknownbrackets commented 11 years ago

Totally insane / seems impossible based on 3rd Birthday / Type-0 and stuff idea. What if GE interrupts only hit once per vblank? There's no way, right?

-[Unknown]

hrydgard commented 11 years ago

Doesn't seem likely at all, but hey, you never know...

unknownbrackets commented 11 years ago

Well, doing that (and reducing the estimation significantly), the God of War demo runs at a perfect 60 fps. It also feels much smoother. Same with Fat Princess.

But the God Eater demo did not like it. Started hitting all sorts of BREAKs. Still runs if they're skipped, though... and runs okay.

Trails in the Sky gets hurt significantly by it (maybe my timing is off), now at ~20 fps normally instead of 30.

For reference, what I did was this as a quick hack:

diff --git a/Core/HLE/sceDisplay.cpp b/Core/HLE/sceDisplay.cpp
index 7f7688d..685937f 100644
--- a/Core/HLE/sceDisplay.cpp
+++ b/Core/HLE/sceDisplay.cpp
@@ -474,11 +474,14 @@ void hleAfterFlip(u64 userdata, int cyclesLate)
    gpu->BeginFrame();  // doesn't really matter if begin or end of frame.
 }

+u64 nextVblankTicks = 0;
+
 void hleLeaveVblank(u64 userdata, int cyclesLate) {
    isVblank = 0;
    DEBUG_LOG(HLE,"Leave VBlank %i", (int)userdata - 1);
    frameStartTicks = CoreTiming::GetTicks();
    CoreTiming::ScheduleEvent(msToCycles(frameMs - vblankMs) - cyclesLate, enterVblankEvent, userdata);
+   nextVblankTicks = frameStartTicks + msToCycles(frameMs) - cyclesLate;
 }

 u32 sceDisplayIsVblank() {
diff --git a/Core/HLE/sceGe.cpp b/Core/HLE/sceGe.cpp
index d41ecfa..50eb66e 100644
--- a/Core/HLE/sceGe.cpp
+++ b/Core/HLE/sceGe.cpp
@@ -207,10 +207,14 @@ bool __GeTriggerSync(WaitType waitType, int id, u64 atTicks)
    return true;
 }

+extern u64 nextVblankTicks;
+
 // Warning: may be called from the GPU thread.
 bool __GeTriggerInterrupt(int listid, u32 pc, u64 atTicks)
 {
    u64 userdata = (u64)listid << 32 | (u64) pc;
+   if (atTicks < nextVblankTicks)
+       atTicks = nextVblankTicks;
    CoreTiming::ScheduleEvent(atTicks - CoreTiming::GetTicks(), geInterruptEvent, userdata);
    return true;
 }
diff --git a/GPU/GLES/TransformPipeline.cpp b/GPU/GLES/TransformPipeline.cpp
index 9c20120..45a2027 100644
--- a/GPU/GLES/TransformPipeline.cpp
+++ b/GPU/GLES/TransformPipeline.cpp
@@ -915,10 +915,10 @@ int TransformDrawEngine::EstimatePerVertexCost() {

    for (int i = 0; i < 4; i++) {
        if (gstate.lightEnable[i] & 1)
-           cost += 20;
+           cost += 10;
    }
    if (gstate.getUVGenMode() != 0) {
-       cost += 20;
+       cost += 10;
    }
    if (dec_ && dec_->morphcount > 1) {
        cost += 5 * dec_->morphcount;
@@ -927,7 +927,7 @@ int TransformDrawEngine::EstimatePerVertexCost() {
    if (CoreTiming::GetClockFrequencyMHz() == 333) {
        // Just brutally double to make God of War happier.
        // FUDGE FACTORS! Delicious fudge factors!
-       cost *= 2;
+       //cost *= 2;
    }
    return cost;
 }

Hmm. Maybe there's certain GE interrupts that behave that way? Or something... strange...

-[Unknown]

unknownbrackets commented 11 years ago

I wonder if there's some sync or something GE instruction / signal that causes this behavior...

-[Unknown]

brujo5 commented 11 years ago

Dragon Ball Z Tenkiachi TagTeam.

cycles executed 1724672 [40.050243 per vertex] only 30fps

screenshot_2013-06-19-21-07-16

unknownbrackets commented 11 years ago

If it sticks at 30.0 fps like that, most likely the game is intentionally rendering at that speed. If you look at my comment above, you'll see many games run at 30.0 fps:

https://github.com/hrydgard/ppsspp/issues/2010#issuecomment-19529324

There might be a way to trick some games into running 60 fps, but I think that's probably more a place for game specific stuff done by talented modders / rom hackers.

Does it run at a different speed in JPCSP or do you have any reason to believe it runs at 60.0 on a real PSP?

-[Unknown]

brujo5 commented 11 years ago

thinking most games runs at 60fps but cpu speed hack might help.

tomorrow i will try in jpcsp.

good night dev.

unknownbrackets commented 11 years ago

So, I poked around in JpcspTrace and made it log before/after and buffer a bit more. From the God of War demo:

44.252794 user_main - sceDisplaySetFramebuf 0x4000000, 0x200, 0x3, 0x0, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF
44.252843 user_main - sceDisplaySetFramebuf = 0x0
44.252935 user_main - sceGeListEnQueue 0x8480000, 0x0, 0x0, 0x0
44.252982 user_main - sceGeListEnQueue = 0x35984144
44.253016 user_main - sceCtrlPeekBufferPositive 0x9FFE7B0, 0x1, 0x8B76878, 0x1, 0xFFFFFFFF, 0x1C, 0x1, 0xDEADBEEF
44.253045 user_main - sceCtrlPeekBufferPositive = 0x1
44.276451 user_main - sceGeListSync 0x35984144, 0x0, 0x8AA0000, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF
44.276502 user_main - sceGeListSync = 0x0
44.276533 user_main - sceDisplaySetFramebuf 0x408C000, 0x200, 0x3, 0x0, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF
44.276564 user_main - sceDisplaySetFramebuf = 0x0
44.276597 user_main - sceGeListEnQueue 0x84A0000, 0x0, 0x0, 0x0
44.276637 user_main - sceGeListEnQueue = 0x35984184
44.306655 user_main - sceCtrlPeekBufferPositive 0x9FFE7B0, 0x1, 0x8B76878, 0x1, 0xFFFFFFFF, 0x1C, 0x1, 0xDEADBEEF
44.306698 user_main - sceCtrlPeekBufferPositive = 0x1
44.308128 user_main - sceGeListSync 0x35984184, 0x0, 0x8AA0000, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF
44.308175 user_main - sceGeListSync = 0x0

It does slow it down, though. But it seems like each syscall takes well under 200 us.

There seems to be big jumps sometimes, but it's not consistent. Hmm. It might be the io from jpcsptrace, though...

-[Unknown]

solarmystic commented 11 years ago

Mortal Kombat Unchained

VPS:60 , FPS: 50 (I have a feeling it's meant to be 60 as it is a fighting game, and 50 flips per second seems irregular (should be 30 or 60)):-

screen00001

It does hit 60 FPS in the main menu and the title screens:-

screen00000

solarmystic commented 11 years ago

With the flips per second measurement included, we can finally see why MGS PO feels sluggish even at 60 VPS.

The game is internally rendering at only 20 FPS, when it should be 30 FPS on a real PSP:-

screen00000

unknownbrackets commented 11 years ago

Can you replace that with a shot that doesn't show 0s? That's one of the skipped frames, unfortunately.

-[Unknown]

solarmystic commented 11 years ago

Here you go @unknownbrackets

screen00002

unknownbrackets commented 11 years ago

The same thing happens in Tales of Phantasia X, when entering a battle. It suddenly spikes to 1000 fps, making it stutter and not actually render properly.

When this happens it seems fine but I saw some sceDisplayWaitVblankStartCB()s inside an interrupt handler which seems dangerous... need to check that more.

-[Unknown]

unknownbrackets commented 11 years ago

After doing more testing with God of War, cranking up buffering as much as possible, etc. it seems like the intro screen is doing 180 fps on an actual PSP. Arg. At least, I logged that many sceDisplaySetFramebuf() calls in a single second.

Suspiciously, it's doing 180 fps every full second I check...

-[Unknown]

unknownbrackets commented 11 years ago

I've separated an optional hack for God of War. That pull also includes a fix for these slowdowns, by removing the previous God of War hack (for the games I tested, the slowdown was gone, except Mana Khemia which has a different issue.)

-[Unknown]

unknownbrackets commented 11 years ago

I don't have all of the games that were getting bad FPS in some areas, so I'll reopen if it's still happening, please just comment here.

-[Unknown]

solarmystic commented 11 years ago

Oh yeah. The God of War game now runs at 60 VPS/60 FPS on my ancient laptop. Got through the title screen in seconds instead of minutes at the usual 3000 FPS lol

Well at least Chains of Olympus does. Ghosts of Sparta is actually a bit more intensive and requires frameskipping to hit 60 VPS for me lol.

Thanks so much for the hack @unknownbrackets

solarmystic commented 11 years ago

Hmm some games don't like the hack at all.

Gundam v Gundam Next Plus has stuttery audio and video when the hack is forced on, even though ironically it already runs at 60 FPS even without it