Ridge Racers (USJS00001) - CPU autodrive was Algorithm buggy

triglav1024 commented 11 years ago

Options -> AV player mode https://www.youtube.com/watch?v=hRrVBM2-OWc https://www.youtube.com/watch?v=XJkM729PeeE https://www.youtube.com/watch?v=3eQ7BlocmUo

Ridge Racers - JP 1.01 / USA 1.00 / EUR 1.00 / HK 1.00 / Asia 1.00

thedax commented 11 years ago

This is probably some sort of CPU or VFPU bug, I'd guess. A similar behaviour occurs in Dolphin with replays in Mario Kart Wii.

One question though, does the bug occur immediately in every replay, like say you start the game and then launch a replay, or does it take say 10-20 minutes for it to appear? If yes, as a workaround, try using the Unlock CPU Speed option/hack since some games get buggy if the emulated PSP CPU speed is changed often. I'd check the debug log to see if it's using scePowerSetClockFrequency often(https://github.com/hrydgard/ppsspp/issues/2104).

triglav1024 commented 11 years ago

I could not think, the CPU hack.I had set the clock to 333Khz. However, it was the same whether you set the default clock.

This behavior has occurred immediately after startup. Every time, I will develop after 40 seconds from the start. In addition, there is no randomness, the same car is always selected. And buggy ....

unknownbrackets commented 10 years ago

Has this improved at all, or does it still do this? There were some timing fixes not that long ago.

-[Unknown]

ppmeis commented 10 years ago

I just test this issue. All replays made by CPU are buggy: car makes strange things during race (like constantly hit the wall), But personal replays works fine.

Tested with latest build 0.9.8-676

unknownbrackets commented 10 years ago

Could this have possibly improved with the vrot fix?

Does having jit off affect it?

-[Unknown]

ppmeis commented 10 years ago

Tested with latest build. CPU replays still buggy:

Jit off does not help:

unknownbrackets commented 10 years ago

Does this still happen in the latest git build?

Make sure you don't have that GEB save compat thing changed from the default.

-[Unknown]

ppmeis commented 10 years ago

Tested with latest build, bug is still present:

ppmeis commented 9 years ago

Tested with latest build. Same status:

unknownbrackets commented 9 years ago

I have the US version of this game, but have not really played it much.

What's the easiest and fastest way to reproduce this issue from scratch (e.g. no savedata / blank slate)? I want to try to see if I can at least cause the autodrive to be wrong in different ways.

Edit: hmm, I think I can repro without savedata actually, n/m.

-[Unknown]

unknownbrackets commented 9 years ago

Excluding alu and lsu like lv/sv/lwc/swc/mt*/mf* type instructions, here's a list of the ones this game does during the AV thing. The value in parens is number of times it was hit, I've moved all the super unlikely ones to the bottom.

mul.s     (54993288)  // Small error = major driving glitches.
add.s     (28356670)  // Small error = major driving glitches.
c.le      (13832384)  // Change to lt = driving glitches happen differently.
sub.s     (13597462)  // Small error = major driving glitches.
vdot      (12944555)  // MAYBE: Introducing a small error makes glitches happen quicker.
vadd      (9333422)   // Small error = major driving glitches.
vscl      (8506915)   // Small error = major driving glitches.
vsub      (5637652)   // Small error = driving glitches happen differently.
trunc.w.s (5548058)   // Small error = driving glitches happen differently.
cvt.s.w   (4223431)   // Small error = major driving glitches.
vsqrt     (3891304)   // MAYBE: Introducing a small error makes glitches MUCH worse.
div.s     (3243544)   // Small error = major driving glitches.
v(h)tfm4  (2253749)   // MAYBE: Introducing a small error makes glitches MUCH worse.
vpfxt     (2184294)   // Ignore = no driving change, but major gfx glitches.  Might still be wrong prefix handling.
vrsq      (971862)    // MAYBE: Introducing a small error makes glitches MUCH worse.
v(h)tfm3  (783310)    // Small error = major driving glitches.
vdiv      (572562)    // Small error = driving glitches happen differently.
sqrt.s    (533778)    // Small error = driving glitches happen differently.
vcrsp.t   (100890)    // MAYBE: Introducing a small error makes glitches happen quicker.

c.lt      (29268876)  // Any change = breaks everything, but unlikely.
mov.s     (15284356)
vone      (9659339)
neg.s     (3638275)
c.eq      (1284924)   // Any change = breaks everything, but unlikely.
abs.s     (562281)    // Small error = crash, unplayable... unlikely.
vmov      (452786)
vneg      (276140)
vmidt     (180679)
vmmov     (62622)
vzero     (14834)

vmul      (4225115)   // Not so small error = no difference.
vi2f      (1135038)   // Makes no difference.
vi2uc     (568000)    // Makes no difference.
vabs      (567519)    // Not so small error = no difference.
cvt.w.s   (379639)    // Not so small error = no difference.
vrot      (284752)    // Not so small error = no difference.
vmmul     (164305)    // Not so small error = no difference.
vqmul.q   (100890)    // Not so small error = no difference.
vcos      (21269)     // Not so small error = no difference.
vrcp      (15796)     // Makes no difference.
vsin      (8350)      // Not so small error = no difference.
vf2iz     (481)       // Makes no difference, even if hardcoded (but graphical glitches yes.)
vrndf1    (245)       // Makes no difference.

AFAICT, it does not change the rounding mode ever from the default.

If it's not a cpu instruction, then maybe it's timing somehow. But man, every almost instruction I try has a major impact on driving, so it could be anything...

-[Unknown]

ppmeis commented 9 years ago

@unknownbrackets as simple as navigate to Settings > AV Player and select Accept, then autodrive will start.

Tested with latest build. Same status.

hrydgard commented 9 years ago

Hm, vrndf1 seems like a suspicious candidate - IIRC we don't reseed the random number generator when a game would write directly to the random context registers of the VFPU. But if it doesn't make a difference if you modify it, then unless the game depends on a particular sequence (that we can't repro anyway as we don't know how the PSP's rndgen works) it's probably not it...

unknownbrackets commented 9 years ago

I thought so too, but no matter what result I make that generate (I tried statically generating 0, 0.5, and I think one other number), it is the same exact incorrect driving, so seems like it can't be that one...

-[Unknown]

unknownbrackets commented 9 years ago

Okay, well, I've eliminated as many instructions as I could: https://github.com/hrydgard/ppsspp/issues/2990#issuecomment-76659348

Still not guaranteeed to be a cpu bug...

-[Unknown]

ppmeis commented 9 years ago

Tested with latest build. Same status:

unknownbrackets commented 6 years ago

Some stats (not sure if useful) showing float usage of various instructions from game start until after the game has clearly gone wrong.

Leftmost number is total floats processed. Then Infinity, NaN, negative zero, and subnormals/denormals.

Since it really goes off a cliff at one point, I was thinking it's possible this is subnormal related... it doesn't ever set the flush to zero flag.

mul.s:      128779215, INF:0     NAN:0     NZ:2239473 SUB:11966
neg.s:      5245302,   INF:0     NAN:0     NZ:79209   SUB:130  
mov.s:      22309050,  INF:0     NAN:5392  NZ:260968  SUB:528    NAN:7fffff-7fffff
vcos:       51246,     INF:0     NAN:0     NZ:0       SUB:0    
vi2f:       3481408,   INF:0     NAN:0     NZ:0       SUB:0    
vadd:       55346715,  INF:0     NAN:0     NZ:413464  SUB:12001
cvt.s.w:    3348734,   INF:0     NAN:0     NZ:0       SUB:0    
div.s:      7354260,   INF:0     NAN:0     NZ:7887    SUB:0    
c.le:       20980612,  INF:0     NAN:0     NZ:40030   SUB:2494 
add.s:      67100736,  INF:0     NAN:0     NZ:1448224 SUB:3376 
trunc.w.s:  4127106,   INF:0     NAN:0     NZ:0       SUB:2448 
sub.s:      30248208,  INF:0     NAN:0     NZ:349366  SUB:5221 
cvt.w.s:    268353,    INF:0     NAN:0     NZ:0       SUB:0    
vf2in:      1740704,   INF:0     NAN:0     NZ:0       SUB:0    
c.eq:       1906462,   INF:0     NAN:0     NZ:425     SUB:40   
abs.s:      874098,    INF:0     NAN:0     NZ:1214    SUB:0    
c.lt:       44685816,  INF:0     NAN:0     NZ:175307  SUB:315  
vdot:       77938675,  INF:0     NAN:0     NZ:26024   SUB:0    
vneg:       759288,    INF:0     NAN:0     NZ:23272   SUB:0    
vrsq:       2117025,   INF:0     NAN:0     NZ:0       SUB:0    
vsat0:      3216,      INF:0     NAN:0     NZ:0       SUB:0    
vscl:       42447592,  INF:0     NAN:0     NZ:124420  SUB:0    
vsub:       33862734,  INF:0     NAN:0     NZ:1483129 SUB:1050 
vsqrt:      7829373,   INF:0     NAN:0     NZ:0       SUB:0    
sqrt.s:     791694,    INF:0     NAN:0     NZ:0       SUB:0    
vcrsp/vqmu: 685962,    INF:0     NAN:0     NZ:11169   SUB:0    
v(h)tfm3:   8347410,   INF:0     NAN:0     NZ:37392   SUB:80   
vrot:       876736,    INF:0     NAN:0     NZ:66106   SUB:0    
vmmul:      5170227,   INF:0     NAN:0     NZ:82346   SUB:0    
vmov:       4018263,   INF:0     NAN:0     NZ:0       SUB:199248
vmmov:      811746,    INF:0     NAN:0     NZ:0       SUB:0    
v(h)tfm4:   40416408,  INF:0     NAN:0     NZ:11657   SUB:0    
vmul:       8497689,   INF:0     NAN:0     NZ:0       SUB:0    
vabs:       2611056,   INF:0     NAN:0     NZ:0       SUB:9    
vdiv:       1316103,   INF:0     NAN:0     NZ:0       SUB:0    
vrcp:       38124,     INF:0     NAN:0     NZ:0       SUB:0    
vsin:       17454,     INF:0     NAN:0     NZ:0       SUB:0    
vrndf1:     735,       INF:0     NAN:0     NZ:0       SUB:0    
vf2iz:      1072,      INF:0     NAN:0     NZ:0       SUB:0

-[Unknown]

unknownbrackets commented 5 years ago

I had tried some things before, but just wanted to note that I've tried forcing subnormal results to 0 (as always seems to happen with many vfpu ops) for vmul/vadd/vsub/vtfm3/vhtfm3/etc., as well as forcing nan to 0x7f800001. There was no change in the failure.

I do think there's a good chance it's related to multiply accuracy.

-[Unknown]

unknownbrackets commented 5 years ago

Update: as a very rough measure, I tried & 0xFFFFFFFE for all the results of vtfm, vadd, vsub, vdiv, and vmul.

Normally, things go wrong right before the second tunnel. With this change, things go wrong before the third tunnel, and it looks right for longer. So this is promising.

Trying to dig into which instruction gets tougher, though. Just disabling the masking for one op at once:

vtfm without mask: still lasts longer, but goes wrong slightly earlier than all masked.
vdiv without mask: goes wrong even earlier than normal.
vmul without mask: better than no masking, but breaks within the second tunnel.
vsub without mask: very similar to vmul disabled.
vadd without mask: goes wrong much earlier than with all masked.

A few other instructions didn't seem to matter, like vmmul or vdot. That said, obviously this doesn't implicate any of the above instructions - it could be that rounding at vsub masks a problem that is really in vmul, or even in vdot.

The important bit here is that rounding/precision is almost definitely at issue here.

For clarity, changing the rounding mode doesn't help things, so it's more complex than that.

-[Unknown]

hrydgard commented 5 years ago

I think that indeed confirms that precision/rounding is the culprit. Masking like that is not likely to accurately simulate the issues though, of course.

I believe in the FTZ thing plus probably a slightly lower-precision dot product implemented in the VFPU hardware (in addition to approximations in vrot and similar). VTFM is very likely to use that hardware dot product.

I think the dot product precision issues could be shown by trying things like dotting a=(1.0, 1.0, 1.0, 1.0) and b = (0.000001, 0.000001, 0.000001, 1.0), and the reverse of b with 1.0 first. The 0.000001 constant should be adjusted so that the sum of three of them just breaks into the precision that's still available when the exponent is set to be able to represent 1.0. That way, if the dot product summing uses collective mantissa alignment and then summing up the mantissas, we'd get the same results if the 1.0 was first or last or whereever, whereas if it's computed like we do by simply summing up the products from left to right, we should get different results.

unknownbrackets commented 5 years ago

For posterity:

{ 0x3F800000, 0x33800000, 0x33800000, 0x33800000 }
{ 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 }
= 0x3f800001

{ 0x33800000, 0x33800000, 0x33800000, 0x3F800000 }
{ 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 }
= 0x3f800001

{ 0x3F800000, 0x34000000, 0x00000000, 0x00000000 }
{ 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 }
= 0x3f800001

{ 0x100BF8FE, 0x581F4DA5, 0x00000000, 0x00000000 }
{ 0x3F800000, 0x0207F3ED, 0x00000000, 0x00000000 }
= 0x1aa9337c

Since order doesn't matter, potentially it's aligning the exponents first and the summing. It'll be interesting to find if vhdp, vfad, vavg, or other ops have similar behavior.

For clarity on anyone reading this, the first two above sums are (base 2):

1.000000000000000000000000 * 1 +
0.000000000000000000000001 * 1 +
0.000000000000000000000001 * 1 +
0.000000000000000000000001 * 1 =
--------------------------
1.000000000000000000000011 = 0x3f800001

Which becomes 1.00000000000000000000001 because of limited mantissa, therefore 0x3f800001. I also tried:

1.000000000000000000000000 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000011 * 1 =
--------------------------
1.000000000000000000000111 = 0x3f800003

1.000000000000000000000000 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000001 * 1 =
--------------------------
1.000000000000000000000101 = 0x3f800002

1.000000000000000000000000 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000010 * 1 =
--------------------------
1.000000000000000000000110 = 0x3f800003

Which all truncated as expected (was trying to verify any rounding behavior.)

Also confirmed the behavior is identical (just with a flipped sign) if I flip the sign of the first vector (meaning it doesn't truncate differently for negative.)

-[Unknown]

unknownbrackets commented 5 years ago

Okay, using this: https://gist.github.com/unknownbrackets/e5bdd06cd8d85712fc51bd7b7707cfd1

Which gets pretty good results (note: multiplying to a temporary float[4] first):

  FMA error: CORRECT 1aa9337c / 0.000000
  1.0*1.0 + 1.0*1.0^-23: CORRECT 3f800001 / 1.000000
  1.0*1.0 + 1.0*1.0^-24 + 1.0*1.0^-24 + 1.0*1.0^-24: CORRECT 3f800001 / 1.000000
  1.0*1.0^-24 + 1.0*1.0^-24 + 1.0*1.0^-24 + 1.0*1.0: CORRECT 3f800001 / 1.000000
  1.0*1.0 + 1.0*1.0^-23 + 1.0*1.0^-23 + 1.0*1.0^-24: CORRECT 3f800002 / 1.000000
  1.0*1.0 + 1.0*1.0^-23 + 1.0*1.0^-23 + 1.0*1.0^-23: CORRECT 3f800003 / 1.000000
  1.0*1.0 + 1.0*1.0^-23 + 1.0*1.0^-23 + 1.1*1.0^-23: CORRECT 3f800003 / 1.000000
  1.0*-1.0 + 1.0*-1.0^-23 + 1.0*-1.0^-23 + 1.1*-1.0^-23: CORRECT bf800003 / -1.000000
  Simulate case 1: CORRECT c75864aa / -55396.664062
  Simulate case 2: CORRECT c7fb200f / -128576.117188
  Simulate case 3: CORRECT c5972dcb / -4837.724121
  Simulate case 4: CORRECT 42222309 / 40.534214
  Simulate case 5: WRONG 3d84e134 / 0.064883  vs  3d84e130 / 0.064883
  Simulate case 5 DEBUG: beb4194f + bdbb66eb + 3f0215ab + 00000000
  Simulate case 5 DEBUG: -0.351756 + -0.091505 + 0.508143 + 0.000000
  Simulate case 6: CORRECT 4136c004 / 11.421879

FWIW case 5 is (I sampled the most different results from Ridge Racer, and used them to debug the software float add):

    ScePspIVector4 dotsim5a = { 0x3f2dc5cb, 0x3e71855a, 0x3f3206af, 0x00000000 };
    ScePspIVector4 dotsim5b = { 0xbf04a8ed, 0xbec6a2ff, 0x3f3b0f83, 0x00000000 };
    testDot("  Simulate case 5", dotsim5a, dotsim5b);

This changes the results. It goes differently wrong right before the second tunnel, but doesn't work out from there. Pretty sure we're barking up the right tree, because everything up to where it goes crazy was right and the same - and the goes crazy point acted differently.

-[Unknown]

hrydgard commented 5 years ago

Cool. It's possible though that this sequence is so sensitive that it won't work all the way through until we've fixed both the FTZ issue and gotten this even more accurate...

Please as always feel free to push even very rough code to a branch or PR, would be interesting to try this on Tekken 6.

Also by the way the BSR instruction (CLZ on ARM) will let us get rid of those annoying while loops in the software add.

Additionally, floating point multiplication in software is actually even easier than addition since there's no realignment needed, just multiply the mantissas, shift down by a fixed amount, and add the exponents (with a bias to account for the 127 base).

Also it's very likely that vhdp, vfad, vavg have similar issues since they almost certainly are reusing the vdot hardware, kind of like the prefix hack ops.

unknownbrackets commented 5 years ago

Here's the branch so far: https://github.com/hrydgard/ppsspp/compare/master...unknownbrackets:vfpu-dot

-[Unknown]

hrydgard commented 5 years ago

@unknownbrackets Thanks, I'll try it on Tekken tonight.

For now, I think it might be a good idea to add an 'n' argument to vdot so there's no requirement to make sure that unused elements are zeroed on shorter dot producted - feels like there could be a couple of bugs around that, although maybe ApplySwizzle takes care of it. (Also, in case vdot somehow would mishandle zero).

hrydgard commented 5 years ago

Does EXTRA_BITS seem to be 2? It's also possible that we should apply some rounding to them before shifting them out at the end.

hrydgard commented 5 years ago

@unknownbrackets Tekken is unfortunately very broken with this, just "disabling" VTFM (allowing interpreter fallback) screws up the graphics entirely. Hm...

unknownbrackets commented 5 years ago

Sorry, I cleaned up some debug code after testing and didn't actually test it again, made a really dumb mistake. Pushed the right version.

Zeros should work fine. Note that I'm applying this to interp, which mostly has to do dots across all four to handle prefixes correctly.

Also, this is an interesting one:

  +/- INF: WRONG 7f800001 / nan  vs  00000000 / 0.000000
  +/- INF DEBUG: 7f800000 + ff800000 + 00000000 + 00000000
  +/- INF DEBUG: inf + -inf + 0.000000 + 0.000000

The correct result is 7f800001 here (which makes sense mathematically...)

-[Unknown]

hrydgard commented 5 years ago

Ah! Well then, I'm happy to report that this seems to fix leg shaking in Tekken 6 completely!

Not quite sure I understand your debug output there, are we or the PSP computing 7f800001? (And that's the dot product of (7f800000, ff800000, 00000000, 00000000) dot (inf, -inf, 0.000000, 0.000000) despite the plus signs?

unknownbrackets commented 5 years ago

The format is WRONG %08x[correct.u] %f[correct.f] vs %08x[simulate.u] %f[simulate.f], though I already changed it to handle that correctly.

The debug output is premultiplied, so it's just the sum of (inf, -inf, 0, 0), or in other words inf - inf. It's output twice, once in hex and then in float. In this case, the other vector is just (1, 1, 1, 1) for simplicity.

We're still getting some cases wrong, but it improves the results of cpu/vfpu/vector too. It might be in the multiply as you suggested.

-[Unknown]

hrydgard commented 5 years ago

Ah, of course. Yeah, vfpu_dot still has some edge cases left, and yup, then there's the multiply... might want to try different rounding modes enabled during simulation to check if one happens to match?

If we do the multiplies in software too, at least they won't be affected by the current local rounding mode...

unknownbrackets commented 5 years ago

With an integer multiply (branch updated), it gets much farther before going crazy. I probably have a mistake hiding in there somewhere, though. After it got much farther it actually eventually hit an invalid memory read (though maybe this is after it was "supposed" to have finished the track?)

It also does NOT match all the accuracy tests, so it's definitely not right still. But it does seem closer.

-[Unknown]

hrydgard commented 5 years ago

Very cool. Of course, it going off track can also be caused by other instructions but seems this indeed has a big influence. I see you switched to clz, nice.

unknownbrackets commented 5 years ago

We're currently using 2 extra bits of precision - I wonder if it still uses a sticky bit (seems annoying to emulate) prior to normalization, or if multiply doesn't actually truncate...

Also for division, this is interesting, though probably not how it actually calculates it: https://www.pvk.ca/Blog/LowLevel/software-reciprocal.html

-[Unknown]

hrydgard commented 5 years ago

I would expect them to use some standard blocks of gates implementing stuff like that, so it's very possible that the sticky bit is there. But of course it's also possible that they designed a very minimal implementation just to make dot products as cheap as possible .. who knows...

Yeah I highly doubt it's done that way..

unknownbrackets commented 5 years ago

Okay, the software dot now matches all our tests and other cherry picked values: https://github.com/hrydgard/ppsspp/compare/master...unknownbrackets:vfpu-dot?expand=1

Turned out to use relatively simple rounding, but I ended up running exhaustive searches on the PSP for test values (by just checking software implementation directly on the PSP, since it calculated the same there.)

The bad news is that this new implementation, despite matching fairly well, makes Ridge Racer go crazy even earlier than it does on master (before the turn it starts doing weird stuff.) It's definitely caused by the rounding.

As far as I could tell, changing the rounding mode has no effect on the vdot results.

I guess it must be other instructions. I'm replicating the list of most used instructions above here, removing ones that have no effect or use vdot internally:

vadd      (9333422)   // Small error = major driving glitches.
vscl      (8506915)   // Small error = major driving glitches.
vsub      (5637652)   // Small error = driving glitches happen differently.
vsqrt     (3891304)   // MAYBE: Introducing a small error makes glitches MUCH worse.
vrsq      (971862)    // MAYBE: Introducing a small error makes glitches MUCH worse.
vdiv      (572562)    // Small error = driving glitches happen differently.
vcrsp.t   (100890)    // MAYBE: Introducing a small error makes glitches happen quicker.

mul.s     (54993288)  // Small error = major driving glitches.
add.s     (28356670)  // Small error = major driving glitches.
c.le      (13832384)  // Change to lt = driving glitches happen differently.
sub.s     (13597462)  // Small error = major driving glitches.
trunc.w.s (5548058)   // Small error = driving glitches happen differently.
cvt.s.w   (4223431)   // Small error = major driving glitches.
div.s     (3243544)   // Small error = major driving glitches.
vpfxt     (2184294)   // Ignore = no driving change, but major gfx glitches.  Might still be wrong prefix handling.
sqrt.s    (533778)    // Small error = driving glitches happen differently.

c.lt      (29268876)  // Any change = breaks everything, but unlikely.
vone      (9659339)
neg.s     (3638275)
c.eq      (1284924)   // Any change = breaks everything, but unlikely.
abs.s     (562281)    // Small error = crash, unplayable... unlikely.
vneg      (276140)
vmidt     (180679)
vmmov     (62622)
vzero     (14834)

Hmm maybe vcrsp.t...

-[Unknown]

unknownbrackets commented 5 years ago

It definitely is more accurate applying the same dot operation in vcrsp, though there's something odd happening with inf there. It affected Ridge Racer in probably a good way, but it still goes crazy a bit earlier than before.

-[Unknown]

unknownbrackets commented 5 years ago

So, it's probably not sqrt.

I wrote a software sqrt, which matches vsqrt much better (sqrtf = exact match 3% of the time, vfpu_sqrt = exact match 84% of the time.) There was no change or improvement to the driving, though.

It could be hiding in the remaining 16% (seems to be a rounding issue, but I can't figure out the right logic for it), but I'd have expected some improvement if the accuracy mattered.

-[Unknown]

unknownbrackets commented 5 years ago

Oops, had a stupid mistake disabling the sqrt. It does improve things. But it also mysteriously makes the game crash (well, it was before if it ran far enough without winning, but now it does it earlier...)

-[Unknown]

unknownbrackets commented 5 years ago

Okay, sorry for the many comments. Found the bug (max_exp == 0 vs max_exp <= 0) causing the crash, so now this is the version that gets the farthest:

https://github.com/hrydgard/ppsspp/compare/master...unknownbrackets:vfpu-dot?expand=1

It still goes crazy eventually. Maybe it's the remaining 16% of sqrt - any ideas what might be wrong there? I tried rounding up or rounding even instead of masking, but maybe wrong...

-[Unknown]

hrydgard commented 5 years ago

Cool. But I don't think Ridge Racer is going to suddenly be fixed 100% after a single instruction is used - it's clear that its "physics" simulation uses a lot of different instructions and any of them can introduce a tiny error, which will get amplified over time and cause the simulation to fall out of sync with the replay data. It's not even certain that a single precision fix will cause the simulation results to be closer to the real thing (although as we fix more things, that does get more likely). And we still don't force FTZ on for VFPU instructions, which we really should if we don't just software emulate them all.

Anyway, this is very good progress already even if Ridge Racer isn't fixed. Who knows what other games might be helped. Unfortunately this stuff is not easy to enable globally, for fear of slowdowns...

unknownbrackets commented 5 years ago

Sure, of course. But there aren't that many instructions left unless it's FPU too. See the list. It's not like it uses sin/cos/etc. I assume Dissidia replays are affected by the same problem, but iirc they use a lot more VFPU instructions.

Also, there's some masking already applying FTZ in that branch. But if you look above, Ridge Racer isn't really sending any subnormals through most of these instructions anyway.

-[Unknown]

hrydgard commented 5 years ago

Well there's vrot, vrsq and vdiv, and vsin and vcos are actually in the list you posted above? (actually never mind about the latter, I see you posted a revised list further down)

ghost commented 4 years ago

the same thing happens on Ridge Racer 7 when played on RPCS3... The autodrive is also buggy.. and i also found another bug... My saved replays is starting to bug also...

hrydgard commented 4 years ago

Yeah, tiny, tiny math inaccuracies can result in this kind of thing, no surprise it happens on RPCS3 as well.

ghost commented 4 years ago

I noticed that when i use a cheat that will alter the car's performance on Ridge Racer, the AV Player CPU car's performance would also change .. So if someone makes a cheat code that will alter the cars performance, probably we would have no Algorithm bug...

hrydgard commented 4 years ago

Nah, you can't conclude that. Your cheat will just be another input that will throw the algorithm off even more, while it's already definitely broken in other ways....

ghost commented 4 years ago

I tried replicating the replays and I broke my fingers halfway on SR765...

ghost commented 4 years ago

Actually, i managed to replicate half of the Seaside Route 765 CPU replay where you drive a Blue Raggio while racing the Angelus... I actually screwed halfway when im supposed to trigger the 2nd NOS... The Raggio drifted on the turn that im not supposed to drift then the Angelus passed me... And since the Raggio is a Dynamic car, i cant control it properly.. Also, when replicating the replays, you got to be precise on the turns or the A.I. Opponents will mess your rhythm... Anyways, here are the 6 tracks with no CPU bugs whatsoever:

Seaside Route 765: https://www.youtube.com/watch?v=kQyHEo4S4wg Sunset Drive: https://www.youtube.com/watch?v=LsFrQ9JJ9T4 Union Hill District: https://www.youtube.com/watch?v=CgpGzMnA_54 Crismonrock Pass: https://www.youtube.com/watch?v=RURjK13Odgk Midtown Expressway: https://www.youtube.com/watch?v=_iOCyYokMco Greenpeak Highlands: https://www.youtube.com/watch?v=kydwDBr9MoA&t

ghost commented 4 years ago

I tried to run Ridge Racer 6 on the Xenia emulator to test the AV Player, while hoping that it won't crash.. But despite Ridge Racer 6 just being an upgraded version of Ridge Racer PSP, I was surprised Ridge Racer 6 AV Player never bugged whatsoever... The course I played was called "Surfside Resort"...

hrydgard / ppsspp

Ridge Racers (USJS00001) - CPU autodrive was Algorithm buggy #2990