mathnet / mathnet-numerics

Math.NET Numerics
http://numerics.mathdotnet.com
MIT License
3.47k stars 893 forks source link

VectorView / Shared storage vector implementation #247

Open redknightlois opened 10 years ago

redknightlois commented 10 years ago

In convolutional neural networks I work with huge ammount of data. With pointers is pretty easy to change the size of the array (using pointer arithmetics) in order to be able to partition the data.

Linear Algebra functionality doesnt support unchecked types so I have to first copy the relevant data to smaller buffers just for processing. We can agree that is a waste of resources ;).

A VectorView/MatrixView would allow to have the full support and at the same time being able to avoid unchecked types interfaces.

cdrnet commented 10 years ago

Creating a vector view is quite straight forward, but will cause all algorithms to fall back to slow indexer-based algorithms since none of them would understand its storage structure (yet). So as is this will not be interesting if performance is important as well.

Example for doubles:

public class VectorView : Vector
{
    public VectorView(double[] data, int offset, int length)
        : base(new Storage(data, offset, length))
    {
    }

    class Storage : VectorStorage<double>
    {
        readonly double[] _data;
        readonly int _offset;

        public Storage(double[] data, int offset, int length) : base(length)
        {
            _data = data;
            _offset = offset;
        }

        public override bool IsDense
        {
            get { return true; }
        }

        public override double At(int index)
        {
            return _data[_offset + index];
        }

        public override void At(int index, double value)
        {
            _data[_offset + index] = value;
        }
    }
}

Using it e.g. in LINQPad with its Dump functions:

void Main()
{
    double[] data = Generate.Gaussian(10, 0.0, 1.0);
    var first = new DenseVector(data).Dump("first");
    var view1 = new VectorView(data, 2, 2).Dump("v1");
    var view2 = new VectorView(data, 6, 2).Dump("v2");

    var second = Vector<double>.Build.Random(2).Dump("second");

    (view1 + view2).Dump("v1+v2");
    (view1 + second).Dump("v1+second");
}

I've been experimenting a bit with unsafe code both in the providers and in the vector/matrix data storage lately. While I prefer to keep the core library safe entirely, there's nothing wrong in providing an optional unsafe extension package. Of course to get full advantage here we'd also have to revise the array-based linear algebra provider interface.

redknightlois commented 10 years ago

I was looking about that too.

One alternative to avoid unsafe methods is to add a new set of low level interfaces (for the linear algebra) that will also provide the start-end offset, stride values, etc. Then standard vectors/matrix will have [0,Count] | stride = 1 default parameters. The high-level API can control those using MatrixView and VectorView. Low level providers like MKL would be able to use those parameters to perform unchecked pointer arithmetics. The only "issue" is that the managed provider will have to pay the cost of the extra offset arithmetics.

redknightlois commented 10 years ago

I stumbled today with ArraySegment ... essentially allows you to define a sub-array in an array. Apparently it is optimized for performance at the CLR level. https://gist.github.com/redknightlois/dfd16d3dd3f79f50cdc6

ArraySegment would play fair with algorithms that assumes arrays, wont require unsafe code and even work in cases of native providers like MKL. After all adding an offset to the start pointer of the array is pretty fast :)

The caveat is that the indexer has been added on .Net 4.5 so for the older frameworks you have to either skip the functionality or implement it yourself.

redknightlois commented 10 years ago

I am building a POC that would allow any native provider and the ManagerLinearAlgebraProvider use both the array notation and the ArraySegment. If you think that the changes would make sense to include them in the distribution I will go ahead, finish MKL and create a pull request.

At the moment I have the MKL array interface with all green tests for vectors and blas. Modifying the default provider for Lapack can be done in a matter of minutes.

Moreover, I tried the ArraySegment on 3.0 compiling with VS2013 and have no trouble with the indexer.

cdrnet commented 10 years ago

I've also worked on a small prototype/PoC in the meantime, for linear algebra providers to operate on Vector/MatrixStorage instead of arrays. The main motivation is to work toward supporting external memory (e.g. unmanaged or GPU) and to finally use MKL for sparse data as well, but it would also affect how we could deal with array segments.

https://github.com/mathnet/mathnet-numerics/tree/provider/src/Numerics/Providers/ExperimentalLinearAlgebra

I wonder how this would align with your work regarding ArraySegment. What are your thoughts of going into this direction?

redknightlois commented 10 years ago

My POC goes in a different direction with a different mindset. Both are part of the same, while your POC is aiming to use external memory, mine is going in the direction of allow ArraySegment 'style' based providers and also allow efficient slicing and sub-matrix/vector without data copy when used in conjuntion with native blas provider.

I was going to push it yet, because LAPACK is missing, but take a look into the direction it is taking: https://github.com/redknightlois/mathnet-numerics/tree/providers-native-offset

I believe that using both approaches we can achieve excelent performance.

redknightlois commented 10 years ago

Realized I made a mistake regarding the timing of the ArraySegment. The impact using an indexer is 2x slowdown. And using the offset arithmetics is 50% give or take. In unmanaged implementations (with pointer arithmetics) it is great, but not for managed implementations.

Access without offset: 38ms. Access via offset: 36ms. Access via array segment: 64ms. Access via array segment at operator: 46ms.

redknightlois commented 10 years ago

After some careful coding I was able to put them in a similar league with a custom implementation...

Access without offset: 40ms. Access via offset: 40ms. Access via array segment: 46ms. Access via array segment at operator: 37ms.

The strange thing is when I go full 32bits I am able to win over the default implementation. The JIT optimizations may be better for my custom ArraySegment than for accessing the array of floats.

Access without offset: 120ms. Access via offset: 120ms. Access via array segment: 62ms. Access via array segment at operator: 121ms.

The ArraySegment implementation is great but only on Framework 4.5 where you have the ability to request aggresive inlining.

In Framework 4.0 speed is awful. And

Access without offset: 38ms. Access via offset: 37ms. Access via array segment: 225ms. Access via array segment at operator: 37ms.

But strangely not in 3.5 too.

Access without offset: 37ms. Access via offset: 37ms. Access via array segment: 45ms. Access via array segment at operator: 37ms.

redknightlois commented 9 years ago

I have been working on this problem. This is more or less the result, but havent finished it yet. Still missing publish it as a NuGet package and the ability to control certain aspects of it. But it is working already.

https://github.com/redknightlois/arrayslice

cdrnet commented 9 years ago

Thanks again for moving forward with this!