sifadil / pcsx2-playground

Automatically exported from code.google.com/p/pcsx2-playground
2 stars 0 forks source link

HSync Counter Improvment (patch) #32

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
This patch appears to improve the accuracy of the hsync counters
considerably.  More testing on more games needed, because the games I have
are so dumb they'd run pretty good if I was counting hsyncs with an abacus.
 Note: This patch does *not* resolve the non-interlaced "too fast" glitch
present in the new counters logic.  We still need to figure that one out. :/

(... and i might find that the same improvement can be made to vsyncs..
will look into it next)

Original issue reported on code.google.com by Jake.Stine on 27 Oct 2008 at 4:06

GoogleCodeExporter commented 8 years ago
Scratch the vsync comment.  Vsync accuracy is tied directly to hsyncs, so 
they're
fine so long as hsyncs are fine.  I still have a hunch that I can improve 
overall
timer accuracy even further though.

Original comment by Jake.Stine on 27 Oct 2008 at 4:17

GoogleCodeExporter commented 8 years ago
Ok, just so everyone knows, I have found a serious error in the counter logic.  
Most
of the time the 'nextsCounter' value is way off the mark, because it's always 
just
kinda blindly assumed two things:

(1) that the non-hSync counters will never interfere with the hSync timings.  
Oops. 
As soon as those counters started running for any reasons, the hsync timings 
would
just drift off into the ozone.

(2) that all branches begin and end on hsyncs.  Any variation in the length of 
branch
executions would cause nextsCounter to drift further and further from the actual
hSync value.

In both cases, the value would run a cyclic course, where-by it would drift 
away from
the proper hsync and drift back toward it after reaching an apex.  So overall 
game
speed stayed pretty consistent, but hsyncs themselves were being counted almost
randomly within the scatter band.

I'm fixing.

Original comment by Jake.Stine on 27 Oct 2008 at 4:56

GoogleCodeExporter commented 8 years ago
Fixed timers glitch.  New patch attached.

This one makes the hsync timer considerably more accurate.  Standard deviation 
from
the 'real' hsync timings is now a mere fraction of what it used to be.  I'm 
curious
to see if this patch fixes games such as Dawn of Mana (which previously only 
worked
with separate vsync timers).  Ultimately I think timing vsync based on the 
hsync is
cleaner and faster -- *if* it works.  It may not. :)

(just noticed I forgot to attach the first patch... doh.. -_-)

Original comment by Jake.Stine on 27 Oct 2008 at 5:55

Attachments:

GoogleCodeExporter commented 8 years ago
Addendum: The above patch also appears to fix the "oddball" accelerated
non-interlaced timing issue.  Interlaced and non-interlaced modes no longer 
require
separate timing mechanisms.  (very promising!)

Original comment by Jake.Stine on 27 Oct 2008 at 6:12

GoogleCodeExporter commented 8 years ago
i'll have to take a look at the patch in detail tomorrow.

it does look more accurate from taking a quick-look; but ultimately, from my 
hours of
debugging earlier today, i think the best way is to calculate hblank and vSync
independently.

but i'll be sure to take a look at that patch tomorrow, as i could be wrong.

Original comment by cottonvibes on 27 Oct 2008 at 6:50

GoogleCodeExporter commented 8 years ago
There are accumulating rounding errors present in the hBlank timer in my 
previous
patch (and in all versions of PCSX2 ever for that matter), and I think that 
might
have been part of the problem when you timed vSync to hSync.  The rounding 
errors
applied to hSync timing alone don't amount to much.  Every so often an hblank 
would
fire out-of-order from the vblank and, maybe in some games, a little graphical
artifact might come of it -- but otherwise everything was ok.  But when vSync 
was
timed to hSync the errors accumulated over time and skewed the entire game's 
rate of
update.

Games sensitive to more accurate vSync timings might have been exploding for 
that reason.

This latest patch re-adjusts both the vSync and hSync timers after every 
progression
(25/30 fps).  In PAL mode the accumulative error is +240, while in NTSC it was 
a bit
smaller at -165.  It would also be easy enough to adjust the error 60 fps 
instead,
although I'm not sure it's necessary.  Those values seem well small enough as 
they are.

But as usual, only lots of testing will tell the real story.  Theory be damned. 
;)

Attached patch is against r238.

Original comment by Jake.Stine on 27 Oct 2008 at 7:05

Attachments:

GoogleCodeExporter commented 8 years ago
This patch also fixes Xenosaga videos (which had audio/video desync), so now we 
know
whats up with this game ;)

Original comment by ramapcsx2 on 27 Oct 2008 at 11:12

GoogleCodeExporter commented 8 years ago
Good!  I think I'm on to something big here.  That was a major glitch in the 
counters
I fixed, and considering how important they are, I can't imagine that it won't 
be
felt in almost all games.  I noticed it fixed a couple low-fps glitches in La 
Pucelle
Tactics too (in addition to fixing glitchy behavior in both Disgaea 1/2 menus). 
 It
used to drop to 1/2th or 1/4th update rate when I would swing changeups around 
while
setting up miracles.  Doesn't seem to happen anymore.

That's what cotton was talking about though; accurate counters can help speed 
up a
lot of games, because now the EE isn't sitting there farting around while it 
waits
for and/or misses hsyncs or vsyncs.

Original comment by Jake.Stine on 27 Oct 2008 at 11:21

GoogleCodeExporter commented 8 years ago
"because now the EE isn't sitting there farting around while it waits
for and/or misses hsyncs or vsyncs"  << made my day :D

Yes, I can't really pinpoint it exactly but games run smoother now :)

Original comment by ramapcsx2 on 27 Oct 2008 at 11:24

GoogleCodeExporter commented 8 years ago
Whops, one addition.
Xenosaga2 ingame jerkyness is still there. Its fixed by using the sync hack 
only.
Without the hack its really jumpy when you run around (yes, fps can reachwell 
over 60).

Using cottonvibes counters from rev 240/241 the game runs fine again (hack on 
or off
doesnt matter, although with it on its even smoother ><).

Original comment by ramapcsx2 on 27 Oct 2008 at 11:40

GoogleCodeExporter commented 8 years ago
Which sync hack are you referring to specifically? :/

Original comment by Jake.Stine on 27 Oct 2008 at 11:48

GoogleCodeExporter commented 8 years ago
EE and IOP * 2
The x3 version is just too game-breaking from my experience.

Original comment by ramapcsx2 on 27 Oct 2008 at 12:09

GoogleCodeExporter commented 8 years ago
Ok, just making sure.
Yeah I don't really understand that part of the emulator yet.  The recompiler 
always
multiplies the cycles counter and I don't know why.  Even if all sync hacks 
disabled
the multiplier is 9/8 (or 1.125).  There's a note in the code that says a 6/5 
ratio
fixes one game and breaks another.  So I'm thinking the whole mess is a little 
bit
"voodoo", so to speak.

In the old PCSX2 code the vblank and hblank timing was pretty lazy. It just 
split the
scans down the middle and made half redraw and half blank (normally redraw is 
about
80% and blank 20%).  Sometimes I wonder how many other hacks are in the emulator
designed to compensate for weak timing.

Original comment by Jake.Stine on 27 Oct 2008 at 12:26

GoogleCodeExporter commented 8 years ago
Well, from what I can see your patch + cottonvibes latest fixes should give the 
best
results.
Is this possible? ;)

Original comment by ramapcsx2 on 27 Oct 2008 at 12:29

GoogleCodeExporter commented 8 years ago
More or less.  r241 still has the basic flaws in the timing mechanism where-by 
it
assumes that branches are synched with the hSync (and they aren't, like, ever). 
 So
the hSync has the cyclic scatterband of inaccuracy as described above.  
Otherwise his
just times vsync as a separate entity from hsync.  But as long as hSync timing 
is
accurate, that shouldn't matter.

Cotton's latest addition does have a timer optimization, but I don't think that
accounts for it running smoother in XS2 (although it could benefit games that 
use the
high-resolution timers, if XS2 happens to use them).  It's not a synchronization
thing but rather a CPU optimization... and I don't think it'd be enough of one 
to
matter.  But then again that one high res timer does look like it could be 
pretty
demanding... hmm.

Original comment by Jake.Stine on 27 Oct 2008 at 12:59

GoogleCodeExporter commented 8 years ago
Actually XS2 *does* use that high resolution counter.
Hmm.  I wonder if it's the optimization in r240 then that's allowed it to run
smoothly without the sync hack. :)

Original comment by Jake.Stine on 27 Oct 2008 at 1:37

GoogleCodeExporter commented 8 years ago
Also, it occurred to me that more accurate timers doesn't necessarily translate 
into
faster emulation.  As I understand it, one of the driving forces behind the x2 
sync
hack's effectiveness is that it speeds up the timers so that code needn't sit 
around
and wait for them as long.  The fewer instructions run between timer updates, 
the
faster the emulation.

Thus even if timers are perfect, the x2 sync hack should still make emulation 
faster
for most games.  More importantly, it's always possible that more accurate 
timers
could slow some games down since, depending on exactly how the timer in 
question is
drifting, it could favor some games to run less code to get to where they want 
to be.
 But of course the game would still emulate better under a more accurate timer model,
if potentially slower.  And timer accuracy should affect the emulation quality 
of the
x2 hacks as well (but not x3 so much.. the skips are too big there for the extra
accuracies to matter much).

And maybe Cottonvibes can confirm or deny my theory.  He knows this stuff 
better than
me by a mile. :P

Original comment by Jake.Stine on 27 Oct 2008 at 3:02

GoogleCodeExporter commented 8 years ago
Well, let me clarify the "hickups" in xs2 a bit. Its not that the game won't 
run smooth
but rather so jumpy I'd consider it broken :p
Instead of delivering the frames consitently, it seems to stop for a while, then
continue normally.
Happens about 3 times per second (guessing).

Original comment by ramapcsx2 on 27 Oct 2008 at 3:53

GoogleCodeExporter commented 8 years ago
Oh!  That!   Well yeah now that you mention it xs2 has always done that for me 
since
I got the dump to toy with.  And sure enough an old version of pcsx2 doesn't do 
it. 
Son of a gun.  And gee, the game's almost playable too (using ZeroGS and about 
every
gfx effect turned off).  Doesn't feel like it's trying to go backward in time 
anymore.

Noooow it makes a lot more sense.  I wonder....!

And don't fret it.  I'm having a ton of fun right now.  Even if nothing I do in 
the
PCSX2 sides ends up being worth while really, it's still just crazy fun playing 
the
role of a codebreaker, so to speak -- and trying to unravel the mysteries of the
interlockings of a super-complex system. ;)

Original comment by Jake.Stine on 27 Oct 2008 at 7:23

GoogleCodeExporter commented 8 years ago
"(1) that the non-hSync counters will never interfere with the hSync timings.  
Oops. 
As soon as those counters started running for any reasons, the hsync timings 
would
just drift off into the ozone."

i need to look at the code again, but i don't think the non-hsync counters will 
interfere with hblank timings.

"(2) that all branches begin and end on hsyncs.  Any variation in the length of 
branch
executions would cause nextsCounter to drift further and further from the actual
hSync value."

yeah thats true, thats probably why my code to base vSync on hSync had alot of 
problems. (when i coded it, i didn't know the code was only being called every 
end-
block)

"That's what cotton was talking about though; accurate counters can help speed 
up a
lot of games, because now the EE isn't sitting there farting around while it 
waits
for and/or misses hsyncs or vsyncs."

Yeah, basically some games perform logic based on these counters.
if these counters arn't incremented, then the game doesn't know time has 
passed, and 
thus keeps waiting for a certain amount of time to pass.
so good counters code can speedup games if they weren't working correctly 
before.

"Ok, just making sure.
Yeah I don't really understand that part of the emulator yet.  The recompiler 
always
multiplies the cycles counter and I don't know why.  Even if all sync hacks 
disabled
the multiplier is 9/8 (or 1.125).  There's a note in the code that says a 6/5 
ratio
fixes one game and breaks another.  So I'm thinking the whole mess is a little 
bit
"voodoo", so to speak/"

well AFAIK, this is a huge-ugly-hack :p

every instruction takes a certain amount of cycles to perform; it looks like 
the EE 
recs are just counting all instructions as 1 cycle.

so then at the end of the recompiled block (the branch), the EE recs multiply 
the 
total cycles it counted by some EEmultiplier.

what *should* be done, is the correct amount of cycles should be determined by 
the 
opcodes being run.

i might look more into that later...

Original comment by cottonvibes on 28 Oct 2008 at 12:22

GoogleCodeExporter commented 8 years ago
>> i need to look at the code again, but i don't think the non-hsync counters 
will 
interfere with hblank timings.

That part refers specifically to how the "nextCounter" value in rcntSet() is
calculated.  It doesn't take into account the hsync's next timing.  It just 
assumes
the hsync is _HSCANLINE cycles away every time, but in reality it could be 50 
cycles
away.  Of course for that assumption to be fixed you have to first fix the 
second
assumption.  It's in hScanline, when you assign cpuRegs.cycle directly to the 
new
hblank counter target (which is a problem due to the block/branch resolution of 
the
timer, as you noted above).  Just need to assign it as cpuRegs.cycle - latency 
(which
is cpuRegs.cycle - counters[4].CycleT (or .Cycle or whichever holds the target 
cycle
when the hBlank was *supposed* to happen).

>> what *should* be done, is the correct amount of cycles should be determined 
by the
opcodes being run.

Ah, ok.  Everything being a single cycle did seem a little overly simplistic. 
;)  I
guess even the interpreter doesn't really have accurate timers then.  Sometimes 
it's
a wonder the emulator works at all.

And I'll try to keep myself useful by not making too many boneheaded errors in
judgment in other places (the xs2 thing was a disaster on my part.. (>_<)

Original comment by Jake.Stine on 28 Oct 2008 at 1:07

GoogleCodeExporter commented 8 years ago
"That part refers specifically to how the "nextCounter" value in rcntSet() is
calculated.  It doesn't take into account the hsync's next timing.  It just 
assumes
the hsync is _HSCANLINE cycles away every time, but in reality it could be 50 
cycles
away.  Of course for that assumption to be fixed you have to first fix the 
second
assumption.  It's in hScanline, when you assign cpuRegs.cycle directly to the 
new
hblank counter target (which is a problem due to the block/branch resolution of 
the
timer, as you noted above).  Just need to assign it as cpuRegs.cycle - latency 
(which
is cpuRegs.cycle - counters[4].CycleT (or .Cycle or whichever holds the target 
cycle
when the hBlank was *supposed* to happen)."

you're right about this, i'm going to have to fix that.

Original comment by cottonvibes on 28 Oct 2008 at 2:01

GoogleCodeExporter commented 8 years ago
I read up on the MIPS.  Turns out the R5900 actually runs 2 instructions per 
cycle
most of the time.  Load/Stores are usually a single cycle since the memory 
accesses
can stall the superscalar pipeline.  And branches are usually 2.25 cycles on 
average
(roughly speaking, 1 cycle for a prediction hit and 3 for a prediction miss).  
So in
practice most MIPs applications actually weigh in at less than a cycle per
instruction (typical CPIs are apparently between 0.79 and 0.94).

The other potential timing problem though is the VU0 unit.  The VU0 code 
modifies the
cpuRegs.cycle value, and so it could be as much to blame for skewed cycle rates 
as
the eeRecs.  As for what it should do.. eh.. well.. god knows how it actually 
monkeys
with the R5900's cycle timings on the PS2 itself.

For what it's worth the R3000a is not a superscalar pipeline and so it's 
average CPI
is usually right around 1.1 to 1.2.  I assume that's where the pcsx2 devs got 
their
magic modifier from.  Although for all I can tell, it doesn't really fit the EE.

Not that it's good news.  The EE's ability to churn through more than 1 
instruction
per cycle, if accurate, would just mean that much more work emulating it.

Original comment by Jake.Stine on 28 Oct 2008 at 2:39

GoogleCodeExporter commented 8 years ago
Correction: The R5900 is not superscalar.  Most of the 64 bit MIPS processors 
are
superscalar, but the R5900 isn't one of them.

Thus, the 1.125 CPI is basically a healthy measure of running code.  Almost 
every
instruction runs in a single cycle on a MIPS, with the sole exceptions of (rare)
load/store stalls and branch prediction misses (which end up being 3 or 4 
cycles). 
So I guess that hack isn't such a bad thing after all.  Go figure. :)

Original comment by Jake.Stine on 28 Oct 2008 at 3:29

GoogleCodeExporter commented 8 years ago
well i did some testing with your patch, and Dawn of Mana videos still had 
problems
(actually the problem got worse for some reason :/).

and Devil May Cry has been crashing from some bug i made in counters.c (not sure
where, but also crashes with 238), so i've reverted to the official SVN code 
for now...

i *did* fix the problem that you mentioned about "nextCounter" not having the 
correct
value, but until i find out what crashed Devil May Cry, i'm not going to commit 
the code.

Original comment by cottonvibes on 28 Oct 2008 at 6:30

GoogleCodeExporter commented 8 years ago
So there's devial may cry, dawn of mana, xenosaga1, xenosaga2, atelier iris 2.. 
maybe
more.
These games all do *something* with counters, and work better or worse with 
each change.

From experience the official svn code at least runs them all (no hangs), but its
still a bit jumpy in some.

Original comment by ramapcsx2 on 28 Oct 2008 at 10:51

GoogleCodeExporter commented 8 years ago
Do all of these games makes use of some of the high-resolution counters?  Maybe 
they
actually have something in common, like a specific set of counter flags.  The 
DMC
crash appears to be unrelated to counter accuracy so we can ignore it for now.  
But
it'd be nice to have a profile of what counters and flags each of those other 
games
uses during the times that they run yucky.

Original comment by Jake.Stine on 28 Oct 2008 at 2:13

GoogleCodeExporter commented 8 years ago
Well, if you tell me how to check for that, I'll tell you the results ;)

Original comment by ramapcsx2 on 28 Oct 2008 at 3:24

GoogleCodeExporter commented 8 years ago
In a debug build of PCSX2 you can enable the EE Counters log in the Debug menu. 
Debug::Logging.  The logfile should be dumped to the logs dir, although I'm not 
sure
offhand what the filename is (haven't used the logging feature yet myself, just 
know
that it should work based on the code).  Collect up a few logs, named for the 
games
that created them, and then we can pass 'em around and analyze.

Might find nothing... or.. we might find something. :)

Original comment by Jake.Stine on 28 Oct 2008 at 4:38

GoogleCodeExporter commented 8 years ago
i don't think the Devil May Cry bug is related to the hsync/vsync algorithm 
changes.

Original comment by cottonvibes on 28 Oct 2008 at 4:39

GoogleCodeExporter commented 8 years ago
I think i found out what broke Devil May Cry (i was looking through the changes 
i 
made in the last revisions, and i found some error in the way i coded something)

i'm at work now though, so i can't test out my solution till i get back home :O

Original comment by cottonvibes on 28 Oct 2008 at 8:08

GoogleCodeExporter commented 8 years ago
That'll be good news, if so.  I still would like a set of EE counter logs for 
games
with apparent timing sensitivity problems though, so that we can look for any
commonalities and/or 'rare' counter settings that other games may not normally 
use. 
And hopefully get an idea as to which counter settings are causing sensitivity 
issues.

Original comment by Jake.Stine on 28 Oct 2008 at 8:13

GoogleCodeExporter commented 8 years ago
btw Jake, i think i remember from looking at your patch, you had a "difference" 
variable as U32.
then later you were checking:
if (difference <= 0) { //systemprint }

the reason this never/rarely occured, is becuase you were using a U32 (unsigned 
int), and not a S32 (signed int).

i believe when you have something like (u32)(100 - 1000), you get negative 
overflow 
(and you'd just end up with a really-big positive number)

so what you need to do is have difference be S32 *or* type cast the u32 as 
(s32) 
whenever you use it and it might countain a negative number.

Original comment by cottonvibes on 28 Oct 2008 at 8:27

GoogleCodeExporter commented 8 years ago
Oh hehe.  Nah.  I changed it to U32 after I concluded that the conditional was 
never
true anyway (in fact it was never even close except when the emulator was first
initializing some bios code), and wanted to get rid of the warning before 
uploading
the patch.  Although now that I think about it, I neglected to consider the 
impact of
the X2 and X3 sync hacks.  X3 in particular may have increased the block size 
large
enough to cause some minor timing issues in rare cases.  Nothing too significant
though.  The hsync timer would fire a bit late, but would have compensated for 
it the
next time through.

And yes, standard behavior for unsigned math is to wrap around.  Signed and 
unsigned
math always yields the exact same bit-wise result, in fact.  So the only times
signed/unsigned matter are for conditionals and sign-extended right shifts.  
SHR will
turn a signed value unsigned by clearing the topmost bit (bad), while SAR will 
extend
the topmost bit, keeping it negative.

And yeah, you can always interchange signed/unsigned values with typecasts, 
since the
bitwise contents of the integer don't change at all.  A fun trick:

u8 this = 254;
s8 that = -2;

if( this == (u8)that ) printf( "yup!");

:)

Original comment by Jake.Stine on 28 Oct 2008 at 9:29

GoogleCodeExporter commented 8 years ago
Issue fixed by cottonvibes in r246.

Original comment by Jake.Stine on 31 Oct 2008 at 12:53