Math performance tuning

GoogleCodeExporter commented 9 years ago

We need to do some performance work in the math library. There are two 
basic lines of work that this involves:
* Fixing currently existing functionality that is not running at 
acceptable speed.
* Adding new fast path functionality for more specific use cases which we 
can drastically accelerate.

For existing functionality, we need to ensure that our code is at least on 
par with MDX and XNA. This means doing substantial performance tests on 
the major math operations, mainly focused around Vector and Matrix. 
Problems with Matrix.Multiply have already been identified and it's 
reasonable to assume there are some other troublesome hotspots.

New functionality will most likely take the form of hand coded routines 
using SSE intrinsics that does processing on large datasets. The model 
here is the unmanaged D3DXVec3TransformCoordArray function; ideally we 
should be able to match or beat this function.

Original issue reported on code.google.com by promit....@gmail.com on 20 Jan 2009 at 7:29

GoogleCodeExporter commented 9 years ago

Issue 407 has been merged into this issue.

Original comment by promit....@gmail.com on 20 Jan 2009 at 7:30

GoogleCodeExporter commented 9 years ago

Made significant progress on this front. Any kind of SSE stuff is being delayed 
past 
the release, since it's likely to be quite involved and may not actually yield 
results.

Next step is to add support for arrays of multiplications.

Original comment by promit....@gmail.com on 2 Mar 2009 at 4:58

GoogleCodeExporter commented 9 years ago

Some array support is in now, specifically Vector.Transform{Coordinate, Normal} 
and 
Matrix.Multiply.

Original comment by promit....@gmail.com on 3 Mar 2009 at 4:00

Added labels: Performance, Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Not looking to do much more on this before the March SDK is out.

Original comment by promit....@gmail.com on 7 Mar 2009 at 8:57

Added labels: Priority-Low
Removed labels: Priority-High

GoogleCodeExporter commented 9 years ago

I'm thinking of doing some performance improvements in the math library.

Good acceleration through SSE is helped immensely by SOA/hybrid data 
structures. 

A plain list of Matrix won't help with that, you still end up with a lot of 
shuffles.
For very math heavy inner loops, the shuffle cost can be small, but 
vector/matrix
muls most likely benefit from better data layout.

So here's an idea: offer special matrix list and vector list containers.

E.g. Vector3List would have this hybrid internal format (not real code):

List<float> { xxxxyyyyzzzzxxxxyyyyzzzzxxxxyyyyzzzz... }

The Vector3List would automatically pack any inserted vectors like this. 
Accessors
and other methods follow suit. Then just add overloads to e.g. Matrix multiply 
for
these containers.

Matrix/Matrix multiplies are more math heavy, and I haven't looked at what 
shuffles
or hybrid format would be useful there.

Original comment by carl.ad...@gmail.com on 24 Mar 2009 at 10:17

GoogleCodeExporter commented 9 years ago

As noted during SlimGen development, using SSE directly in a Managed 
application 
appears to be buggy at best. The resulting code is generally slower than the 
associated 
managed code.

This means we either depending on an external DLL for SSE calls, or we 
suggest/support 
use of something like SlimGen (still in development).

Original comment by ryoohki@gmail.com on 25 Sep 2009 at 7:43

GoogleCodeExporter commented 9 years ago

This issue has run its course by now. We're moving to SlimMath in the near 
future,
which will be SSE optimized via SlimGen.

Original comment by Mike.Popoloski on 18 Jan 2010 at 3:56

Changed state: Fixed

zengqh / slimdx

Math performance tuning #408