sifadil / pcsx2-playground

Automatically exported from code.google.com/p/pcsx2-playground
2 stars 0 forks source link

Potential bugfix in Spu2ghz interpolated mixers (patch) #21

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Found and fixed a potential bug in the spu2ghz linear and cubic
interpolation mixers (patch attached).  Interpolated output should be much
better and consistent.

Enjoy!

Original issue reported on code.google.com by Jake.Stine on 14 Oct 2008 at 2:38

Attachments:

GoogleCodeExporter commented 8 years ago
Will test that tomorrow, promised.
Btw, keep going with that MMX / SSE suggestion you had ;)

Original comment by ramapcsx2 on 14 Oct 2008 at 2:58

GoogleCodeExporter commented 8 years ago
I commented out the assignment of vc.PeakX at the bottom of GetVoiceValues() and
didn't comment why: it's a value that's not used anywhere in the plugin 
anymore. 
Chances are gigaherz used it a while back to help determine ideal mixer volume
factors, and there's not really any other use for it.  Rather than commenting 
it out
completely you could leave it in as part of the debug build only.  It's code 
that
gets run upwards of 800k times a second though, so getting it out of the 
Release mode
is a good thing.

And yeah I'll see what I can do to help speed it up.  I know spu2ghz takes a 
'quality
first' approach so I won't submit anything unless it doesn't interfere with 
audio
quality.  (that said, in addition to the idea of SSE-based VA/PCM decoders there
might be a way for me to re-order the way the mixer cycles through its voices 
so that
it should get much better CPU code/data cache hits).

Original comment by Jake.Stine on 14 Oct 2008 at 8:47

GoogleCodeExporter commented 8 years ago
K, i ran a few tests with the changes.
First of all: It overflows audio :p

s64 t0 = ((a0   )*mu)>>18; 
s64 t1 = ((t0+a1)*mu)>>18;
s64 t2 = ((t1+a2)*mu)>>18;

This fixes the overflows (went from all beeing >>12 to >>18).
When this is done i get nice audio, but i cannot detect any improvement over 
nearest
interpolation :(

Original comment by ramapcsx2 on 15 Oct 2008 at 3:13

GoogleCodeExporter commented 8 years ago
Big thanks

is this incorporated into the latest svn or do we we need to download this 
patch and
patch the code separately?

Original comment by reach....@gmail.com on 15 Oct 2008 at 6:32

GoogleCodeExporter commented 8 years ago
reach.Ray: Rama will include it in the SVN once it's well tested and confirmed a
proper fix.

Rama: Yeah, the cubic interp code (if working properly) should sound basically
identical to linear interp.  There are only a few cases where cubic 
interpolation
sounds noticeably better than linear, and those are namely low-frequency 
waveforms
(bass lines) and synth samples (waveforms that look like regular sine waves).  
Cubic
interpolation can actually produce less accurate results on a lot of techno, for
example, since techno uses lots of linear waveforms- square waves, distortion, 
etc. 
But most of the time the difference between the two interpolation types simply 
isn't
audible (normally less then 1% variance in the resulting wave dump).

... and for that matter, for anything that's using a lot of streaming audio 
(music or
voice) neither interpolation should be much different from nearest mode.  The 
key
here being that it's been fixed so that interpolation modes no longer make those
games sound *worse* (which is what was happening before), and should still 
improve
quality if the developers chose to use sub-48khz voices or sound effects or 
what-not.

Concerning optimization: Unfortunately there's not much I can do after all.  As 
it
turns out the SPU2 is very order-centric.  Voices have to be mixed in a specific
order due to voice modulation.  The ADPCM decoder is 100% progressive so 
there's no
way to use SIMD to speed it up either (each decoded sample is dependent on the
previous sample's decoded value).  Nor is it possible to decode multiple ADPCM 
blocks
at once, since that risks breaking streaming audio in some games.

Just for kicks, I did attempt to set up an ADPCM decoder cache.  It stored 
decoded
blocks in a memory buffer and referenced those instead of decoding the data 
each time
(invalidating blocks during memory writes).  But the speed improvement only 
applied
to non-streaming audio, since streaming data can't be cached... and added the 
extra
overhead of failed cache lookups to streamed audio.  And since many games make 
pretty
heavy use of streamed audio, the overall net result was a loss.  Even Disgaea, 
which
doesn't use much CD audio at all, still uses a lot of dynamic mixing/processing
tricks for its sound effects so its overall performance gain was minimal too.  
Oh
well. :)

Original comment by Jake.Stine on 15 Oct 2008 at 10:29

GoogleCodeExporter commented 8 years ago
Hmm, if possible I'd still like to test that as well :p
I bet you don't have Xenosaga2. Now that game really stresses SPU2, usually 
using
all 24 voices of one core while doing sound effects on the other ^^

Original comment by ramapcsx2 on 15 Oct 2008 at 11:12

GoogleCodeExporter commented 8 years ago
Yeah, that sounds like it would run pretty slow.  The ADPCM code is really cpu
intensive and I could imagine it using 25-30% of a 3.2ghz cpu just for audio on
something that used 30+ voices. Ouch :/

Well I'll give it another try.  I took an approach last time that minimized 
memory
use but required a lot of bitshifts to check the cache status (done everytime a 
byte
was written to memory).  Due to the sheer number of write locations though I 
should
probably stick with a faster/simpler approach.  And just so you know, the cache
requires a 4MB memory allocation (3.5mb for decompressed audio blocks plus half 
a meg
for cache status flags -- most games won't even use 20% of it but I haven't 
thought
of an efficient way to make it an on-demand allocation system).

Aside from the cache, the only way to really speed up the spu2ghz is to 
implement
support for 24khz output, using an X2-style speedhack of sorts.  One of the 
benefits
of working interpolation is that it lets you use a lower outputted sampling rate
without it sounding like complete crap.  24hkz/linear would reduce spu2ghz cpu 
usage
by about 20-25% compared to 48khz/nearest (25-30% faster than 48khz/linear).  
It'll
definitely sound a lot less clean of course, but I'm sure some users would 
still be
happy with it... especially in SPU2-intensive cases where the alternatives are 
either
really poor fps or no sound at all.

... but it needs a little work to implement properly, and I don't think I can 
support
anything other than 24khz.  Think it's worth while?

Original comment by Jake.Stine on 16 Oct 2008 at 12:57

GoogleCodeExporter commented 8 years ago
Nah, every user that really needs that last fps is using peops or zerospu2 
anyway :p

But: 
What really helps this plugin (even gain those 25% in demanding situations) is 
doing
profile guided optimizations on it.
It gets real close to zerospu2 speeds then, give it a try :)

Original comment by ramapcsx2 on 16 Oct 2008 at 8:19

GoogleCodeExporter commented 8 years ago
Ok, I'll still futz with the cache then since, if done right it should give a 
good
50-75% speed boost to something like Xenosaga2, and shouldn't really add much
overhead to streaming audio (around 1%)... and of course would have no impact on
audio quality so can be enabled all the time (user easy!).

Regarding PGO: If it helps spu2ghz that much, what effect does it have on 
zerospu2?  ;)

Anyways, I did go through and implement the "usual" round of optimizations.  
Mostly
it involved writing more compiler-friendly code, so that the compiler can more 
more
assumptions as needed (avoiding use of global vars, using object handles for
frequently-accessed array elements, etc).  I annotated most of the code 
changes.  I
also added two new versions of the ADPCM decoder.  One behaves identically, but
should be faster than the old method, and the other has the saturation removed 
(so it
will be faster for sure- question is by how much).

Patch attached.  It's a major one and it includes some other tweaks too 
(removed some
redundant code loops from dsoundout and such, and also includes my waveout 
patch). 
Go ahead and benchmark the different versions of the ADPCM decoder and see what 
the
speed differences are.  I don't have time to do it here. :/

There's also one change to the V_Voice structure that'll break savestates.  
Dunno how
much of a problem/annoyance that is.  It's pretty minor anyway, may not be 
worth the
trouble.  And actually, if you are ok with breaking old savestates there are 2-3
other opts I can make to the V_Voice struct that should further improve cpu 
cache hit
rates (at which point it might actually be worth-while).

Final note: linear interpolation will cause pops at the very beginning of some 
sound
effects.  This is more or less unavoidable (in the old mod/s3m/xm days we used a
technique called micro-volume ramping to get rid of it).  Cubic interpolation 
usually
gets rid of it too, which can be listed as one of the advantages of cubic 
interp (or,
possibly, I could force-disable interpolation for the first 8 samples of any new
voice, which might also fix it... hmm!  Will have to try that someday.)

I'm gone for 4 days (work), so you won't hear from me until then.  Good luck. :)

Original comment by Jake.Stine on 16 Oct 2008 at 11:12

Attachments:

GoogleCodeExporter commented 8 years ago
This is included in the latest patch, closing

Original comment by ramapcsx2 on 28 Oct 2008 at 10:52