mdabros / SharpLearning

Machine learning for C# .Net
MIT License
383 stars 84 forks source link

Replace SharpLearning.Containers.Matrices.F64Matrix with multidimensional array #20

Open mdabros opened 7 years ago

mdabros commented 7 years ago

In SharpLearning, the F64Matrix class, which is part of the Learner interfaces, is mostly used as a container for holding the features for a learning problem. While SharpLearning does contain some arithmetic extensions for the F64Matrix, the arithmetic is not used by any of the learners. Also, more efficient implementations can be found in Math.net.

Therefore it might indicate, that the primary container for features in SharpLearning should rather be a standard .net type like multidimensional array (double[,]) or jagged array (double[][]), with some extension methods to add the current functionality of the F64Matrix.

An alternative, also suggested in #6, would be to replace the F64Matrix directly by using Math.net as the matrix provider. However, since only the SharpLearning.Neural project is using matrix arithmetic and with the plan of using CNTK as backend, math.net is a large dependency to take, if only using the matrix class as a feature container. So currently, I am leaning more towards replacing F64Matrix with a standard .net type. However, to better handle integration between Math.Net and SharpLearning, maybe a separate project, SharpLearning.MathNet, could be added with efficient conversions between Math.net and SharpLearning containers (both copy and shared memory). This of course depends on what data structure ends up replacing F64Matrix, if any.

These are my current thoughts, and people are very welcome to discuss and pitch in with suggestions.

diegoful commented 6 years ago

Another idea, is to let the Learn methods take a SharpLearning.Containers.Matrices.IMatrix, instead of an F64Matrix, so that one can use their own custom class (as long as it the implements IMatrix). This allows creating a wrapper for one of the Math.Net implementations.

BTW, congrats on creating the first .NET ML library that actually makes sense through-and-through.

mdabros commented 6 years ago

Hi @diegoful,

Thanks for joining and adding to the discussion. Also, thanks for the kind words! I am glad that you find SharpLearning useful and that the design makes sense.

Using the IMatrix interface in the learners is also something I have considered, since as you suggest, it would make it possible to create custom implementations, and thereby make the learners more open to other containers. In the current state, the F64Matrix has some view extensions, using pointers to avoid copying memory. These are not a part of the IMatrix interface, and is used in some of the learners. But it might be possible to cleanup a bit to avoid using views, without decreasing the efficiency of the learners too much. So it is definitely a valid option.

Currently I am also considering to use the tensor type Microsoft is introducing. If this becomes a standard part of .net, i think it would make sense to use that implementation. This would also be useful when dealing with higher dimensional data for deep learning algorithms. My hope is that other libraries, like Math.net, would also adapt some interfacing to this type, if it becomes standard. Using the tensor type, it would probably still make sense to hide the concrete implementation behind an interface, to still have the option of custom implementations.

best regards Mads