Closed GoogleCodeExporter closed 8 years ago
Will test that tomorrow, promised.
Btw, keep going with that MMX / SSE suggestion you had ;)
Original comment by ramapcsx2
on 14 Oct 2008 at 2:58
I commented out the assignment of vc.PeakX at the bottom of GetVoiceValues() and
didn't comment why: it's a value that's not used anywhere in the plugin
anymore.
Chances are gigaherz used it a while back to help determine ideal mixer volume
factors, and there's not really any other use for it. Rather than commenting
it out
completely you could leave it in as part of the debug build only. It's code
that
gets run upwards of 800k times a second though, so getting it out of the
Release mode
is a good thing.
And yeah I'll see what I can do to help speed it up. I know spu2ghz takes a
'quality
first' approach so I won't submit anything unless it doesn't interfere with
audio
quality. (that said, in addition to the idea of SSE-based VA/PCM decoders there
might be a way for me to re-order the way the mixer cycles through its voices
so that
it should get much better CPU code/data cache hits).
Original comment by Jake.Stine
on 14 Oct 2008 at 8:47
K, i ran a few tests with the changes.
First of all: It overflows audio :p
s64 t0 = ((a0 )*mu)>>18;
s64 t1 = ((t0+a1)*mu)>>18;
s64 t2 = ((t1+a2)*mu)>>18;
This fixes the overflows (went from all beeing >>12 to >>18).
When this is done i get nice audio, but i cannot detect any improvement over
nearest
interpolation :(
Original comment by ramapcsx2
on 15 Oct 2008 at 3:13
Big thanks
is this incorporated into the latest svn or do we we need to download this
patch and
patch the code separately?
Original comment by reach....@gmail.com
on 15 Oct 2008 at 6:32
reach.Ray: Rama will include it in the SVN once it's well tested and confirmed a
proper fix.
Rama: Yeah, the cubic interp code (if working properly) should sound basically
identical to linear interp. There are only a few cases where cubic
interpolation
sounds noticeably better than linear, and those are namely low-frequency
waveforms
(bass lines) and synth samples (waveforms that look like regular sine waves).
Cubic
interpolation can actually produce less accurate results on a lot of techno, for
example, since techno uses lots of linear waveforms- square waves, distortion,
etc.
But most of the time the difference between the two interpolation types simply
isn't
audible (normally less then 1% variance in the resulting wave dump).
... and for that matter, for anything that's using a lot of streaming audio
(music or
voice) neither interpolation should be much different from nearest mode. The
key
here being that it's been fixed so that interpolation modes no longer make those
games sound *worse* (which is what was happening before), and should still
improve
quality if the developers chose to use sub-48khz voices or sound effects or
what-not.
Concerning optimization: Unfortunately there's not much I can do after all. As
it
turns out the SPU2 is very order-centric. Voices have to be mixed in a specific
order due to voice modulation. The ADPCM decoder is 100% progressive so
there's no
way to use SIMD to speed it up either (each decoded sample is dependent on the
previous sample's decoded value). Nor is it possible to decode multiple ADPCM
blocks
at once, since that risks breaking streaming audio in some games.
Just for kicks, I did attempt to set up an ADPCM decoder cache. It stored
decoded
blocks in a memory buffer and referenced those instead of decoding the data
each time
(invalidating blocks during memory writes). But the speed improvement only
applied
to non-streaming audio, since streaming data can't be cached... and added the
extra
overhead of failed cache lookups to streamed audio. And since many games make
pretty
heavy use of streamed audio, the overall net result was a loss. Even Disgaea,
which
doesn't use much CD audio at all, still uses a lot of dynamic mixing/processing
tricks for its sound effects so its overall performance gain was minimal too.
Oh
well. :)
Original comment by Jake.Stine
on 15 Oct 2008 at 10:29
Hmm, if possible I'd still like to test that as well :p
I bet you don't have Xenosaga2. Now that game really stresses SPU2, usually
using
all 24 voices of one core while doing sound effects on the other ^^
Original comment by ramapcsx2
on 15 Oct 2008 at 11:12
Yeah, that sounds like it would run pretty slow. The ADPCM code is really cpu
intensive and I could imagine it using 25-30% of a 3.2ghz cpu just for audio on
something that used 30+ voices. Ouch :/
Well I'll give it another try. I took an approach last time that minimized
memory
use but required a lot of bitshifts to check the cache status (done everytime a
byte
was written to memory). Due to the sheer number of write locations though I
should
probably stick with a faster/simpler approach. And just so you know, the cache
requires a 4MB memory allocation (3.5mb for decompressed audio blocks plus half
a meg
for cache status flags -- most games won't even use 20% of it but I haven't
thought
of an efficient way to make it an on-demand allocation system).
Aside from the cache, the only way to really speed up the spu2ghz is to
implement
support for 24khz output, using an X2-style speedhack of sorts. One of the
benefits
of working interpolation is that it lets you use a lower outputted sampling rate
without it sounding like complete crap. 24hkz/linear would reduce spu2ghz cpu
usage
by about 20-25% compared to 48khz/nearest (25-30% faster than 48khz/linear).
It'll
definitely sound a lot less clean of course, but I'm sure some users would
still be
happy with it... especially in SPU2-intensive cases where the alternatives are
either
really poor fps or no sound at all.
... but it needs a little work to implement properly, and I don't think I can
support
anything other than 24khz. Think it's worth while?
Original comment by Jake.Stine
on 16 Oct 2008 at 12:57
Nah, every user that really needs that last fps is using peops or zerospu2
anyway :p
But:
What really helps this plugin (even gain those 25% in demanding situations) is
doing
profile guided optimizations on it.
It gets real close to zerospu2 speeds then, give it a try :)
Original comment by ramapcsx2
on 16 Oct 2008 at 8:19
Ok, I'll still futz with the cache then since, if done right it should give a
good
50-75% speed boost to something like Xenosaga2, and shouldn't really add much
overhead to streaming audio (around 1%)... and of course would have no impact on
audio quality so can be enabled all the time (user easy!).
Regarding PGO: If it helps spu2ghz that much, what effect does it have on
zerospu2? ;)
Anyways, I did go through and implement the "usual" round of optimizations.
Mostly
it involved writing more compiler-friendly code, so that the compiler can more
more
assumptions as needed (avoiding use of global vars, using object handles for
frequently-accessed array elements, etc). I annotated most of the code
changes. I
also added two new versions of the ADPCM decoder. One behaves identically, but
should be faster than the old method, and the other has the saturation removed
(so it
will be faster for sure- question is by how much).
Patch attached. It's a major one and it includes some other tweaks too
(removed some
redundant code loops from dsoundout and such, and also includes my waveout
patch).
Go ahead and benchmark the different versions of the ADPCM decoder and see what
the
speed differences are. I don't have time to do it here. :/
There's also one change to the V_Voice structure that'll break savestates.
Dunno how
much of a problem/annoyance that is. It's pretty minor anyway, may not be
worth the
trouble. And actually, if you are ok with breaking old savestates there are 2-3
other opts I can make to the V_Voice struct that should further improve cpu
cache hit
rates (at which point it might actually be worth-while).
Final note: linear interpolation will cause pops at the very beginning of some
sound
effects. This is more or less unavoidable (in the old mod/s3m/xm days we used a
technique called micro-volume ramping to get rid of it). Cubic interpolation
usually
gets rid of it too, which can be listed as one of the advantages of cubic
interp (or,
possibly, I could force-disable interpolation for the first 8 samples of any new
voice, which might also fix it... hmm! Will have to try that someday.)
I'm gone for 4 days (work), so you won't hear from me until then. Good luck. :)
Original comment by Jake.Stine
on 16 Oct 2008 at 11:12
Attachments:
This is included in the latest patch, closing
Original comment by ramapcsx2
on 28 Oct 2008 at 10:52
Original issue reported on code.google.com by
Jake.Stine
on 14 Oct 2008 at 2:38Attachments: