sifadil / pcsx2-playground

Automatically exported from code.google.com/p/pcsx2-playground
2 stars 0 forks source link

SPU2ghz: ADPCM Decoder Cache (plus many more fixes & optimizations) #28

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Yes, ideally this would have been submitted as a series of smaller patches,
but the cache implementation changed many areas of code and that made me
aware of many other tweaks, optimizations, and fixes along the way.

The patch seems stable and has limited testing on the following games:
 Disgaea
 Disgaea 2
 Xenosaga 2

List of changes:

 * Added an ADPCM decoded block cache, which greatly improves speed
non-streaming voices (most sound effects and some games' music tracks).
 * Optimized out several uses of MulDiv (huge speed gain!)
 * Added some optimization directives (via the __assume intrinsic - minor
speed gain)
 * Removed various ConLog calls from the Public Release build of mixer.cpp
(minor speed boost). [all logs are still present in the Release build]
 * Fixed interpolation (my previous fix only half fixed the problem as it
turns out), and optimized both linear and cubic interpolators slightly.
 * Corrected a bug introduced by my last patch that was causing pops/clicks
at the beginning of most voices.
 * Switched back to the clamped XA/ADPCM decoder for now - with the cache
operational, we can afford the overhead of clamping now (still not sure
it's actually needed though).
 * Possibly more reliable pClock timings.
 * Some minor improvements to the Volume Slider (xenosaga2 music seems to
sound a little better now). A future update should have more in-depth work
in that area though.
 * Much improved handling of buffer overruns and underruns in SndOut.cpp -
reducing or eliminating many instances of loopbacking or stuttering audio.
 * Robust "unknown savestate" recovery: the SPU2 will completely disable
itself, which ensures a stable environment from which a user can find a
save spot. (not really all that useful, but a nice courtesy to users
none-the-less).

Summary: With this patch SPU2ghz should now match or exceed Zerospu2 in
terms of speed in almost all areas of all games, and it should sound a lot
better too!

Original issue reported on code.google.com by Jake.Stine on 22 Oct 2008 at 5:20

Attachments:

GoogleCodeExporter commented 8 years ago
Hmm, several notes about your work:
When it works, its fast and sounds great :)

Problems so far:
-With interpolation set to nearest we seem to loose a core or smth :p several 
sound
effects arent playing.
-The "sound overflow bug" is still present, and kills the sound in Odin Sphere 
for
example.
-Disabling the whole spu2 when a savestate isn't compatible is bad :p
Games will talk to SPU2, and if they do not get DMA stuff and IRQs they just 
stop.

But on the bright side:
I see a huge speedgain in several games - very nice.
Also the sound is crystal clear (in games that don't trigger the new bugs :p ).

Thanks for this work, I'm sure we'll get the bugs ironed out :)

Original comment by ramapcsx2 on 22 Oct 2008 at 9:50

GoogleCodeExporter commented 8 years ago
The new overflows are happening here:
ExtL = MulShr32( ExtL<<5, ((int)thiscore.ExtL)<<15);
ExtR = MulShr32( ExtR<<5, ((int)thiscore.ExtR)<<15);

I hacked around with it a bit, but it won't sound right for some reason :p
Reverting it to the old muldiv for ExtL and ExtR works as expected.

Original comment by ramapcsx2 on 22 Oct 2008 at 10:35

GoogleCodeExporter commented 8 years ago
Hmm nice, your changes to sndout.cpp will help me get rid of the ugly buffer 
control
with timestretching. yay! :p

Original comment by ramapcsx2 on 22 Oct 2008 at 11:07

GoogleCodeExporter commented 8 years ago
Yeah I thought that might help simplify some of the timestretching code.  Plus 
it
gets rid of the horrible looping sound when saving and loading states. ;)

On the disabled SPU2: I thought of that, but as far as I could tell it'll just 
bomb
out anyway on games like that.  The whole memory state is basically invalid 
after the
"non" recovery so any game that relies on IRQ/DMA stuff would likely explode
regardless.  The IRQ triggers depend on certain Core flags being 
correct/enabled, and
chances are they won't match what the game wants anyway.

Actually scratch that.  Assuming the user has started the game and let it run 
to a
certain point before attempting to load the state (past initial menus, etc), 
there's
a very good chance the Core will be set up the way the game wants it... enough 
to
properly trigger IRQs on DMA anyway (which is probably most important).  Ok, 
I'll
play with it.

On the overflow:  grr!  You know what the problem might be... I should make an
Unsigned version of MulShr32.  Using signed math on unsigned values usually 
works
fine most of the time, but things can get funny when using all 64 bits of the 
math.

On Linear dropping a core:
Weird!  It doesn't happen in Cubic mode?  And does it happen in Nearest mode?
Once I know the answers to those I should be able to troubleshoot it.  And 
actually
the Unsigned MulShr32 might fix it too since the ADSR is an unsigned value.  
(patch
coming soon).  So try that first once I upload it. :)

Original comment by Jake.Stine on 22 Oct 2008 at 2:16

GoogleCodeExporter commented 8 years ago
New unsigned support for MulShr32, and some improved typecasting for mixer.cpp. 
Might fix overflows.

Original comment by Jake.Stine on 22 Oct 2008 at 2:34

Attachments:

GoogleCodeExporter commented 8 years ago
K, the latest patch (its broken btw, had to handcode it :p ) fixes most of the 
overflows.
Only odin sphere has a slight distortion on soundeffects left (yep, sx2 is 
perfect now).

The missing audio thing is only happening with nearest (none) interpolation.

Original comment by ramapcsx2 on 22 Oct 2008 at 4:03

GoogleCodeExporter commented 8 years ago
ExtL = MulShr32su( ExtL<<3, ((int)thiscore.ExtL)<<16);
ExtR = MulShr32su( ExtR<<3, ((int)thiscore.ExtR)<<16);
^
This fixes the problem but lowers volume quite a bit :p (was << 4 in your patch)

Original comment by ramapcsx2 on 22 Oct 2008 at 4:18

GoogleCodeExporter commented 8 years ago
Oh.. oops!  I see why there's no sound in Nearest mode.  Stupid me. :P  It 
wasn't
working for me either, actually, but I forgot to test it.  I've always been a 
lousy
tester.

But good to know it's pretty solid now.  I'll fix up the savestate recovery bit 
and
re-upload a full patch against r226 which should be ready for commit.  If the 
odin
sphere slight distortion glitch is still present we can track it down and fix it
afterward.  It'll be easier to do minor patche fixes and such from an SVN 
revision. 
I can do rollbacks and diffs a lot easier then. ;)

Original comment by Jake.Stine on 22 Oct 2008 at 4:36

GoogleCodeExporter commented 8 years ago
That fixes the odin sphere distortion?

Try this instead:
ExtL = MulShr32su( ExtL<<4, ((u32)thiscore.ExtL)<<16);
ExtR = MulShr32su( ExtR<<4, ((u32)thiscore.ExtR)<<16);

... changes the (int) typecasts to (u32).

Original comment by Jake.Stine on 22 Oct 2008 at 4:38

GoogleCodeExporter commented 8 years ago
Whops, just commited the prelimanary work (to have a base to work with).
Please use that for you next patch :p

Original comment by ramapcsx2 on 22 Oct 2008 at 4:40

GoogleCodeExporter commented 8 years ago
Tried your suggestion, no change.
The audio generated is really just too loud, so it clamps i guess.

Original comment by ramapcsx2 on 22 Oct 2008 at 4:45

GoogleCodeExporter commented 8 years ago
Lol, I'm abusing this issue system :p

ExtL = MulShr32su( ExtL<<3, ((int)thiscore.ExtL)<<17);
ExtR = MulShr32su( ExtR<<3, ((int)thiscore.ExtR)<<17);
^
How about that? Volume is nice and overflows are gone :)

Original comment by ramapcsx2 on 22 Oct 2008 at 4:49

GoogleCodeExporter commented 8 years ago
Nah it's cool.  The issue system works well as a forum, and as soon as the 
topic is
resolved we can remove it from the active Issues list.  It's kinda handy really.

At first I would have said that looks bad, but after looking the code over that 
might
actually be a good approach.  It depends on if thiscore.ExtL / ExtR values ever
exceed 0x7fff or not.  Chances are they never will, so it's probably fine. :)

I'll think about it in the meantime.  There might yet be a more ideal solution
lurking in the shadows.

Original comment by Jake.Stine on 22 Oct 2008 at 5:04

GoogleCodeExporter commented 8 years ago
Update:  I did find a better overall resolution to the ExtL/ExtR value 
overflows. 
For now I'm going to get rid of the 64 bit precision on those multipliers, and 
I'll
look at re-implementing a high-precision method in the future.  The extra 
precision
isn't really important anyway, and in all likeliness the SPU2 itself only uses 
32 bit
precision when mixing the External Input into the second core.

And regarding SndOut.cpp : I still get periodic overruns on my Progressive Scan 
games
(La Pucelle, Disgaea), but Disgaea2 and Xenosaga2 work fine.  Is it something 
with
PCSX2 timings that makes the SPU2 out-pace the sound buffer output on certain 
games?
 The obvious blame would seem to fall on frame limiting... But then again for some
reason ZeroSPU2 doesn't exhibit the problem.  The only difference I could find 
in the
timing code between the two plugins is that ZeroSPU2 uses the old style 
SPU2async
while spu2ghz uses the new style that passes a *(pclk) pointer to PCSX2's 
internal
clock.  I'm pretty sure that shouldn't make a difference.

Thoughts?

Original comment by Jake.Stine on 22 Oct 2008 at 8:26

GoogleCodeExporter commented 8 years ago
Thoughts? Yeah.
ZeroSPU2 somehow manages to work on a huge buffer, so over/underflows are very 
rare.
With SPU2Ghz the output buffers are really small (and increasing them is no 
option >
too much lag).

We just recently introduced a NTSC game framelimit of 59.94fps instead of 60.
The main reason for that was actually SPU2Ghz, since it really needs an exact
game speed to perform well :p

Original comment by ramapcsx2 on 22 Oct 2008 at 8:43

GoogleCodeExporter commented 8 years ago
Hmm.. it still doesn't feel right.  The buffer is overflowing by nearly 512 
samples
per second on these particular games.  That means spu2ghz is mixing at an 
effective
speed of 48,512.  Even ZeroSPU's large buffers (around 500ms) would have some
noticeable overflow problems at that level of inaccuracy.  A large buffer can 
only
save you for so long.

It just seems really odd... especially when the timing is fine for some games 
but not
others..?  I'm seriously stumped on that one.

Original comment by Jake.Stine on 22 Oct 2008 at 9:01

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Okie, here we go.  I think this'll be my final 'major' patch for a while.

Quick breakdown:  Mixer volume overflows should be fixed, nearest mode is fixed,
buffer overruns are handled a bit better now, and the savestate recovery is 
back the
way it was.  And I was able to squeeze a wee bit more speed out of the core 
mixer
still yet. ;)

Thanks for helping me troubleshoot everything.
[this is my 2nd post, first post had a booboo in the patch file]

Original comment by Jake.Stine on 22 Oct 2008 at 10:15

Attachments:

GoogleCodeExporter commented 8 years ago
Fixed a bug in the volumes for Core 1 external input (exploited by xs2 sound
effects).  Volumes were way too quiet, almost inaudible.

I set the volume to one notch below what spu2ghz 1.9r2 used to have.  You had
mentioned before that xs2 sfx were too loud in previous versions, and the new 
math
'feels' more correct as well (the bitshifts balance out such that overflows 
should be
impossible).  If it breaks another game (ala too quiet sfx) it'll *probably* 
mean
that game is exploiting a glitch in some other area of volume control.

This is a new patch against r227, which supercedes the one I uploaded earlier...

Original comment by Jake.Stine on 23 Oct 2008 at 12:10

Attachments:

GoogleCodeExporter commented 8 years ago
This is included in the latest patch, closing

Original comment by ramapcsx2 on 28 Oct 2008 at 10:52