Closed GoogleCodeExporter closed 8 years ago
Scratch the vsync comment. Vsync accuracy is tied directly to hsyncs, so
they're
fine so long as hsyncs are fine. I still have a hunch that I can improve
overall
timer accuracy even further though.
Original comment by Jake.Stine
on 27 Oct 2008 at 4:17
Ok, just so everyone knows, I have found a serious error in the counter logic.
Most
of the time the 'nextsCounter' value is way off the mark, because it's always
just
kinda blindly assumed two things:
(1) that the non-hSync counters will never interfere with the hSync timings.
Oops.
As soon as those counters started running for any reasons, the hsync timings
would
just drift off into the ozone.
(2) that all branches begin and end on hsyncs. Any variation in the length of
branch
executions would cause nextsCounter to drift further and further from the actual
hSync value.
In both cases, the value would run a cyclic course, where-by it would drift
away from
the proper hsync and drift back toward it after reaching an apex. So overall
game
speed stayed pretty consistent, but hsyncs themselves were being counted almost
randomly within the scatter band.
I'm fixing.
Original comment by Jake.Stine
on 27 Oct 2008 at 4:56
Fixed timers glitch. New patch attached.
This one makes the hsync timer considerably more accurate. Standard deviation
from
the 'real' hsync timings is now a mere fraction of what it used to be. I'm
curious
to see if this patch fixes games such as Dawn of Mana (which previously only
worked
with separate vsync timers). Ultimately I think timing vsync based on the
hsync is
cleaner and faster -- *if* it works. It may not. :)
(just noticed I forgot to attach the first patch... doh.. -_-)
Original comment by Jake.Stine
on 27 Oct 2008 at 5:55
Attachments:
Addendum: The above patch also appears to fix the "oddball" accelerated
non-interlaced timing issue. Interlaced and non-interlaced modes no longer
require
separate timing mechanisms. (very promising!)
Original comment by Jake.Stine
on 27 Oct 2008 at 6:12
i'll have to take a look at the patch in detail tomorrow.
it does look more accurate from taking a quick-look; but ultimately, from my
hours of
debugging earlier today, i think the best way is to calculate hblank and vSync
independently.
but i'll be sure to take a look at that patch tomorrow, as i could be wrong.
Original comment by cottonvibes
on 27 Oct 2008 at 6:50
There are accumulating rounding errors present in the hBlank timer in my
previous
patch (and in all versions of PCSX2 ever for that matter), and I think that
might
have been part of the problem when you timed vSync to hSync. The rounding
errors
applied to hSync timing alone don't amount to much. Every so often an hblank
would
fire out-of-order from the vblank and, maybe in some games, a little graphical
artifact might come of it -- but otherwise everything was ok. But when vSync
was
timed to hSync the errors accumulated over time and skewed the entire game's
rate of
update.
Games sensitive to more accurate vSync timings might have been exploding for
that reason.
This latest patch re-adjusts both the vSync and hSync timers after every
progression
(25/30 fps). In PAL mode the accumulative error is +240, while in NTSC it was
a bit
smaller at -165. It would also be easy enough to adjust the error 60 fps
instead,
although I'm not sure it's necessary. Those values seem well small enough as
they are.
But as usual, only lots of testing will tell the real story. Theory be damned.
;)
Attached patch is against r238.
Original comment by Jake.Stine
on 27 Oct 2008 at 7:05
Attachments:
This patch also fixes Xenosaga videos (which had audio/video desync), so now we
know
whats up with this game ;)
Original comment by ramapcsx2
on 27 Oct 2008 at 11:12
Good! I think I'm on to something big here. That was a major glitch in the
counters
I fixed, and considering how important they are, I can't imagine that it won't
be
felt in almost all games. I noticed it fixed a couple low-fps glitches in La
Pucelle
Tactics too (in addition to fixing glitchy behavior in both Disgaea 1/2 menus).
It
used to drop to 1/2th or 1/4th update rate when I would swing changeups around
while
setting up miracles. Doesn't seem to happen anymore.
That's what cotton was talking about though; accurate counters can help speed
up a
lot of games, because now the EE isn't sitting there farting around while it
waits
for and/or misses hsyncs or vsyncs.
Original comment by Jake.Stine
on 27 Oct 2008 at 11:21
"because now the EE isn't sitting there farting around while it waits
for and/or misses hsyncs or vsyncs" << made my day :D
Yes, I can't really pinpoint it exactly but games run smoother now :)
Original comment by ramapcsx2
on 27 Oct 2008 at 11:24
Whops, one addition.
Xenosaga2 ingame jerkyness is still there. Its fixed by using the sync hack
only.
Without the hack its really jumpy when you run around (yes, fps can reachwell
over 60).
Using cottonvibes counters from rev 240/241 the game runs fine again (hack on
or off
doesnt matter, although with it on its even smoother ><).
Original comment by ramapcsx2
on 27 Oct 2008 at 11:40
Which sync hack are you referring to specifically? :/
Original comment by Jake.Stine
on 27 Oct 2008 at 11:48
EE and IOP * 2
The x3 version is just too game-breaking from my experience.
Original comment by ramapcsx2
on 27 Oct 2008 at 12:09
Ok, just making sure.
Yeah I don't really understand that part of the emulator yet. The recompiler
always
multiplies the cycles counter and I don't know why. Even if all sync hacks
disabled
the multiplier is 9/8 (or 1.125). There's a note in the code that says a 6/5
ratio
fixes one game and breaks another. So I'm thinking the whole mess is a little
bit
"voodoo", so to speak.
In the old PCSX2 code the vblank and hblank timing was pretty lazy. It just
split the
scans down the middle and made half redraw and half blank (normally redraw is
about
80% and blank 20%). Sometimes I wonder how many other hacks are in the emulator
designed to compensate for weak timing.
Original comment by Jake.Stine
on 27 Oct 2008 at 12:26
Well, from what I can see your patch + cottonvibes latest fixes should give the
best
results.
Is this possible? ;)
Original comment by ramapcsx2
on 27 Oct 2008 at 12:29
More or less. r241 still has the basic flaws in the timing mechanism where-by
it
assumes that branches are synched with the hSync (and they aren't, like, ever).
So
the hSync has the cyclic scatterband of inaccuracy as described above.
Otherwise his
just times vsync as a separate entity from hsync. But as long as hSync timing
is
accurate, that shouldn't matter.
Cotton's latest addition does have a timer optimization, but I don't think that
accounts for it running smoother in XS2 (although it could benefit games that
use the
high-resolution timers, if XS2 happens to use them). It's not a synchronization
thing but rather a CPU optimization... and I don't think it'd be enough of one
to
matter. But then again that one high res timer does look like it could be
pretty
demanding... hmm.
Original comment by Jake.Stine
on 27 Oct 2008 at 12:59
Actually XS2 *does* use that high resolution counter.
Hmm. I wonder if it's the optimization in r240 then that's allowed it to run
smoothly without the sync hack. :)
Original comment by Jake.Stine
on 27 Oct 2008 at 1:37
Also, it occurred to me that more accurate timers doesn't necessarily translate
into
faster emulation. As I understand it, one of the driving forces behind the x2
sync
hack's effectiveness is that it speeds up the timers so that code needn't sit
around
and wait for them as long. The fewer instructions run between timer updates,
the
faster the emulation.
Thus even if timers are perfect, the x2 sync hack should still make emulation
faster
for most games. More importantly, it's always possible that more accurate
timers
could slow some games down since, depending on exactly how the timer in
question is
drifting, it could favor some games to run less code to get to where they want
to be.
But of course the game would still emulate better under a more accurate timer model,
if potentially slower. And timer accuracy should affect the emulation quality
of the
x2 hacks as well (but not x3 so much.. the skips are too big there for the extra
accuracies to matter much).
And maybe Cottonvibes can confirm or deny my theory. He knows this stuff
better than
me by a mile. :P
Original comment by Jake.Stine
on 27 Oct 2008 at 3:02
Well, let me clarify the "hickups" in xs2 a bit. Its not that the game won't
run smooth
but rather so jumpy I'd consider it broken :p
Instead of delivering the frames consitently, it seems to stop for a while, then
continue normally.
Happens about 3 times per second (guessing).
Original comment by ramapcsx2
on 27 Oct 2008 at 3:53
Oh! That! Well yeah now that you mention it xs2 has always done that for me
since
I got the dump to toy with. And sure enough an old version of pcsx2 doesn't do
it.
Son of a gun. And gee, the game's almost playable too (using ZeroGS and about
every
gfx effect turned off). Doesn't feel like it's trying to go backward in time
anymore.
Noooow it makes a lot more sense. I wonder....!
And don't fret it. I'm having a ton of fun right now. Even if nothing I do in
the
PCSX2 sides ends up being worth while really, it's still just crazy fun playing
the
role of a codebreaker, so to speak -- and trying to unravel the mysteries of the
interlockings of a super-complex system. ;)
Original comment by Jake.Stine
on 27 Oct 2008 at 7:23
"(1) that the non-hSync counters will never interfere with the hSync timings.
Oops.
As soon as those counters started running for any reasons, the hsync timings
would
just drift off into the ozone."
i need to look at the code again, but i don't think the non-hsync counters will
interfere with hblank timings.
"(2) that all branches begin and end on hsyncs. Any variation in the length of
branch
executions would cause nextsCounter to drift further and further from the actual
hSync value."
yeah thats true, thats probably why my code to base vSync on hSync had alot of
problems. (when i coded it, i didn't know the code was only being called every
end-
block)
"That's what cotton was talking about though; accurate counters can help speed
up a
lot of games, because now the EE isn't sitting there farting around while it
waits
for and/or misses hsyncs or vsyncs."
Yeah, basically some games perform logic based on these counters.
if these counters arn't incremented, then the game doesn't know time has
passed, and
thus keeps waiting for a certain amount of time to pass.
so good counters code can speedup games if they weren't working correctly
before.
"Ok, just making sure.
Yeah I don't really understand that part of the emulator yet. The recompiler
always
multiplies the cycles counter and I don't know why. Even if all sync hacks
disabled
the multiplier is 9/8 (or 1.125). There's a note in the code that says a 6/5
ratio
fixes one game and breaks another. So I'm thinking the whole mess is a little
bit
"voodoo", so to speak/"
well AFAIK, this is a huge-ugly-hack :p
every instruction takes a certain amount of cycles to perform; it looks like
the EE
recs are just counting all instructions as 1 cycle.
so then at the end of the recompiled block (the branch), the EE recs multiply
the
total cycles it counted by some EEmultiplier.
what *should* be done, is the correct amount of cycles should be determined by
the
opcodes being run.
i might look more into that later...
Original comment by cottonvibes
on 28 Oct 2008 at 12:22
>> i need to look at the code again, but i don't think the non-hsync counters
will
interfere with hblank timings.
That part refers specifically to how the "nextCounter" value in rcntSet() is
calculated. It doesn't take into account the hsync's next timing. It just
assumes
the hsync is _HSCANLINE cycles away every time, but in reality it could be 50
cycles
away. Of course for that assumption to be fixed you have to first fix the
second
assumption. It's in hScanline, when you assign cpuRegs.cycle directly to the
new
hblank counter target (which is a problem due to the block/branch resolution of
the
timer, as you noted above). Just need to assign it as cpuRegs.cycle - latency
(which
is cpuRegs.cycle - counters[4].CycleT (or .Cycle or whichever holds the target
cycle
when the hBlank was *supposed* to happen).
>> what *should* be done, is the correct amount of cycles should be determined
by the
opcodes being run.
Ah, ok. Everything being a single cycle did seem a little overly simplistic.
;) I
guess even the interpreter doesn't really have accurate timers then. Sometimes
it's
a wonder the emulator works at all.
And I'll try to keep myself useful by not making too many boneheaded errors in
judgment in other places (the xs2 thing was a disaster on my part.. (>_<)
Original comment by Jake.Stine
on 28 Oct 2008 at 1:07
"That part refers specifically to how the "nextCounter" value in rcntSet() is
calculated. It doesn't take into account the hsync's next timing. It just
assumes
the hsync is _HSCANLINE cycles away every time, but in reality it could be 50
cycles
away. Of course for that assumption to be fixed you have to first fix the
second
assumption. It's in hScanline, when you assign cpuRegs.cycle directly to the
new
hblank counter target (which is a problem due to the block/branch resolution of
the
timer, as you noted above). Just need to assign it as cpuRegs.cycle - latency
(which
is cpuRegs.cycle - counters[4].CycleT (or .Cycle or whichever holds the target
cycle
when the hBlank was *supposed* to happen)."
you're right about this, i'm going to have to fix that.
Original comment by cottonvibes
on 28 Oct 2008 at 2:01
I read up on the MIPS. Turns out the R5900 actually runs 2 instructions per
cycle
most of the time. Load/Stores are usually a single cycle since the memory
accesses
can stall the superscalar pipeline. And branches are usually 2.25 cycles on
average
(roughly speaking, 1 cycle for a prediction hit and 3 for a prediction miss).
So in
practice most MIPs applications actually weigh in at less than a cycle per
instruction (typical CPIs are apparently between 0.79 and 0.94).
The other potential timing problem though is the VU0 unit. The VU0 code
modifies the
cpuRegs.cycle value, and so it could be as much to blame for skewed cycle rates
as
the eeRecs. As for what it should do.. eh.. well.. god knows how it actually
monkeys
with the R5900's cycle timings on the PS2 itself.
For what it's worth the R3000a is not a superscalar pipeline and so it's
average CPI
is usually right around 1.1 to 1.2. I assume that's where the pcsx2 devs got
their
magic modifier from. Although for all I can tell, it doesn't really fit the EE.
Not that it's good news. The EE's ability to churn through more than 1
instruction
per cycle, if accurate, would just mean that much more work emulating it.
Original comment by Jake.Stine
on 28 Oct 2008 at 2:39
Correction: The R5900 is not superscalar. Most of the 64 bit MIPS processors
are
superscalar, but the R5900 isn't one of them.
Thus, the 1.125 CPI is basically a healthy measure of running code. Almost
every
instruction runs in a single cycle on a MIPS, with the sole exceptions of (rare)
load/store stalls and branch prediction misses (which end up being 3 or 4
cycles).
So I guess that hack isn't such a bad thing after all. Go figure. :)
Original comment by Jake.Stine
on 28 Oct 2008 at 3:29
well i did some testing with your patch, and Dawn of Mana videos still had
problems
(actually the problem got worse for some reason :/).
and Devil May Cry has been crashing from some bug i made in counters.c (not sure
where, but also crashes with 238), so i've reverted to the official SVN code
for now...
i *did* fix the problem that you mentioned about "nextCounter" not having the
correct
value, but until i find out what crashed Devil May Cry, i'm not going to commit
the code.
Original comment by cottonvibes
on 28 Oct 2008 at 6:30
So there's devial may cry, dawn of mana, xenosaga1, xenosaga2, atelier iris 2..
maybe
more.
These games all do *something* with counters, and work better or worse with
each change.
From experience the official svn code at least runs them all (no hangs), but its
still a bit jumpy in some.
Original comment by ramapcsx2
on 28 Oct 2008 at 10:51
Do all of these games makes use of some of the high-resolution counters? Maybe
they
actually have something in common, like a specific set of counter flags. The
DMC
crash appears to be unrelated to counter accuracy so we can ignore it for now.
But
it'd be nice to have a profile of what counters and flags each of those other
games
uses during the times that they run yucky.
Original comment by Jake.Stine
on 28 Oct 2008 at 2:13
Well, if you tell me how to check for that, I'll tell you the results ;)
Original comment by ramapcsx2
on 28 Oct 2008 at 3:24
In a debug build of PCSX2 you can enable the EE Counters log in the Debug menu.
Debug::Logging. The logfile should be dumped to the logs dir, although I'm not
sure
offhand what the filename is (haven't used the logging feature yet myself, just
know
that it should work based on the code). Collect up a few logs, named for the
games
that created them, and then we can pass 'em around and analyze.
Might find nothing... or.. we might find something. :)
Original comment by Jake.Stine
on 28 Oct 2008 at 4:38
i don't think the Devil May Cry bug is related to the hsync/vsync algorithm
changes.
Original comment by cottonvibes
on 28 Oct 2008 at 4:39
I think i found out what broke Devil May Cry (i was looking through the changes
i
made in the last revisions, and i found some error in the way i coded something)
i'm at work now though, so i can't test out my solution till i get back home :O
Original comment by cottonvibes
on 28 Oct 2008 at 8:08
That'll be good news, if so. I still would like a set of EE counter logs for
games
with apparent timing sensitivity problems though, so that we can look for any
commonalities and/or 'rare' counter settings that other games may not normally
use.
And hopefully get an idea as to which counter settings are causing sensitivity
issues.
Original comment by Jake.Stine
on 28 Oct 2008 at 8:13
btw Jake, i think i remember from looking at your patch, you had a "difference"
variable as U32.
then later you were checking:
if (difference <= 0) { //systemprint }
the reason this never/rarely occured, is becuase you were using a U32 (unsigned
int), and not a S32 (signed int).
i believe when you have something like (u32)(100 - 1000), you get negative
overflow
(and you'd just end up with a really-big positive number)
so what you need to do is have difference be S32 *or* type cast the u32 as
(s32)
whenever you use it and it might countain a negative number.
Original comment by cottonvibes
on 28 Oct 2008 at 8:27
Oh hehe. Nah. I changed it to U32 after I concluded that the conditional was
never
true anyway (in fact it was never even close except when the emulator was first
initializing some bios code), and wanted to get rid of the warning before
uploading
the patch. Although now that I think about it, I neglected to consider the
impact of
the X2 and X3 sync hacks. X3 in particular may have increased the block size
large
enough to cause some minor timing issues in rare cases. Nothing too significant
though. The hsync timer would fire a bit late, but would have compensated for
it the
next time through.
And yes, standard behavior for unsigned math is to wrap around. Signed and
unsigned
math always yields the exact same bit-wise result, in fact. So the only times
signed/unsigned matter are for conditionals and sign-extended right shifts.
SHR will
turn a signed value unsigned by clearing the topmost bit (bad), while SAR will
extend
the topmost bit, keeping it negative.
And yeah, you can always interchange signed/unsigned values with typecasts,
since the
bitwise contents of the integer don't change at all. A fun trick:
u8 this = 254;
s8 that = -2;
if( this == (u8)that ) printf( "yup!");
:)
Original comment by Jake.Stine
on 28 Oct 2008 at 9:29
Issue fixed by cottonvibes in r246.
Original comment by Jake.Stine
on 31 Oct 2008 at 12:53
Original issue reported on code.google.com by
Jake.Stine
on 27 Oct 2008 at 4:06