Choose linear algebra/matrix library implementation for optimized nupic.core

breznak commented 9 years ago

requirements copied from parent #28

usability
- suitable licence
- platform support (Linux/Mac/Win; x86_64)
- convenient installation/bundling with nupic
functionality
- SSE instructions
- GPGPU backend support (CUDA, openCL)
- parallelism support (openMP)
- sparse matrices
programming
- clean & lean API
- active development
- (opt) binding

This is important as all other tasks on #28 depend on this implementation; once we settled we should stick with it for a while :)

TODO

[X] spread the word on ML
[X] research and post alternatives, reviews
[ ] discuss suggestions
[ ] reach decision on the winner
- [ ] profile performance critical methods
- [ ] bechmark candidate libraries on #372

Lingebra libraries review:

Candidates:

Eigen

foreword: I've never programmed using lingebra lib in c++ but I spent quite some time doing my homework in this research, Eigen showed most often and in good light.

MPL(+GPLv2) licence
portable (Win/OSX/Lin,..)
simple bundling (only cpp header includes)
simple and mature API
support fast small-fixed sized/ large/ sparse matrices
seemless SIMD vectorization
multi-threaded parallelization
partial GPGPU gpumatrix magma
partial bindings to python, java

breznak commented 9 years ago

CC @subutai @oxtopus @scottpurdy @rhyolight @david-ragazzi :game_die:

david-ragazzi commented 9 years ago

Will this decrease the LOC (lines of code) in nupic.core/nupic/math and maybe nupic/bindings/math? If so, looks fine to me.

breznak commented 9 years ago

Will this decrease the LOC (lines of code) in nupic.core/nupic/math

it should cut the hell out of it ;) Namely the ASM code we fought with a while ago, and all {Sparse,Dense}Matrix are candidates for replacement

david-ragazzi commented 9 years ago

it should cut the hell out of it ;) Namely the ASM code we fought with a while ago, and all {Sparse,Dense}Matrix are candidates for replacement

Uow! Looks specially atraent to me! :smile:

subutai commented 9 years ago

Speed is the most important criterion here. We need to verify that the new library does indeed give a speed improvement. The sparse matrix libraries are very custom to our needs and extremely efficient. We looked at the standard libraries before and they did not have sufficiently good sparse support for the operations we need at the time.

Perhaps that has changed, but I am doubtful and will need solid proof. I am happy to be proven wrong, as it would get rid of a lot of code as mentioned in the comments.

jshahbazi commented 9 years ago

Might I suggest taking a look at Armadillo with OpenBLAS? I've used it before and it was a pleasure to work with. With its use of templates, the code almost looks like Matlab or Fortran. And OpenBLAS approaches Intel MKL in speed.

breznak commented 9 years ago

some replies from the ML:

I think, it is meaningfull especially for very big and sparse matrices. @Thanh-Binh

True. Typically we use <1x2048> vectors, although eg for vision tasks the matrices can be much bigger. Eigen claims to offer speedups even for small matrices = https://github.com/numenta/nupic.core/issues/371

breznak commented 9 years ago

Have you guys looked into using a bit vector compression lib to maybe formalize the SDRs into formal classes/objects rather than treatment as primitive arrays? @cogmission

I have thought about it - bitvector (but more for python nupic). It would be nice, but I think for nupic.core speed is paramount. And we are not so much limited by the memory (yet).

So i think the GPGPU/SIMD predominant the mem footprint.

Also, considering the sparse matrices and the fact SDRs are ~2% sparse, for 1024x1024 =1M we get 50x from sparse, and 8-32x from byte/int -> bit representation.

breznak commented 9 years ago

@subutai

Speed is the most important criterion here. We need to verify that the new library does indeed give a speed improvement.

True that. So we need to identify the most performance critical parts of code (that yet have to be quite simple to re-implement) ..like SparseMatrix, the ASM code in math/TP computations. See https://github.com/numenta/nupic.core/issues/372 and do a lingebra library rundown -- this might be time consuming/hard/costly. See https://github.com/numenta/nupic.core/issues/372

The sparse matrix libraries are very custom to our needs and extremely efficient. We looked at the standard libraries before and they did not have sufficiently good sparse support for the operations we need at the time.

With all the respect, I find this hard to believe (at the current date), as you developed neural net theory&implementation (along with some optimization), but those libraries focus solely on speed (and have much bigger funding). There are some issues nupic still has:

single threaded
no GPU backend use

that would translate to direct N-times speedup.

All in all, let's see (but how?) :smile_cat:

breznak commented 9 years ago

Might I suggest taking a look at Armadillo with OpenBLAS? I've used it before and it was a pleasure to work with. With its use of templates, the code almost looks like Matlab or Fortran. And OpenBLAS approaches Intel MKL in speed. @jshahbazi

definitely! thank you for input. Armadillo x Eigen comparison: http://nghiaho.com/?p=1726 -- Armadillo+openBlas is better Armadillo x Eigen: https://stackoverflow.com/questions/10366054/c-performance-in-eigen-library -- Eigen uses another approach Eigen can use openBLAS: https://forum.kde.org/viewtopic.php?f=74&t=110509 -- Eigen can use BLAS/LAPACK backend, which mitigates the speed issue.

@jshahbazi @subutai I see the library selection will be uneasy, and quite often a matter of taste too. I have defined the requirements (open for modifications), and this also depends a lot on IFF someone is going to tackle this and what he prefers - so maybe we just say:

fit the requirements
improve over current state (speed vise)
do whatever you like ?

scottpurdy commented 9 years ago

I'd point out that in addition to proving some benefit before we will give consideration, there are nice benefits of having our own implementation. If we need a highly optimized operation, having our own implementation makes it very easy to add the operation. If we use a third-party library, it is much more difficult.

I personally don't think this is the best use of our time right now but would not discourage anyone that thinks there may be a real benefit and wants to spend their time on it.

breznak commented 9 years ago

personally don't think this is the best use of our time right now but would not discourage anyone that thinks there may be a real benefit and wants to spend their time on it.

I want to run Nupic on tasks for NLP, vision, ... where the current speed is a blocker for the size of data I need to crunch, that's driving my motivation here.

scottpurdy commented 9 years ago

I want to run Nupic on tasks for NLP, vision, ... where the current speed is a blocker for the size of data I need to crunch, that's driving my motivation here.

Well I hope that there are significant speed improvements to realize. It is good to know what your goal is - we can reuse a lot of previous Numenta experience with vision.

rcrowder commented 9 years ago

(If required) Looks like Eigen can be tested here; http://docs.biicode.com/c++/examples/eigen.html

jshahbazi commented 9 years ago

You all will have to please forgive me for my novice understanding of the code (I'm still learning it... slowly), but I wanted to understand what kinds of calculations are being made within nupic that could require a library like Eigen or Armadillo or MKL or OpenBLAS or whatever. Is there massive matrix multiplication going on? Vector multiplication? Even if someone could just point me to proper class/function/file so I could get a better handle on it, I think I could offer up some help with this.

rhyolight commented 8 years ago

Please review this issue

This issue needs to be reviewed by the original author or another contributor for applicability to the current codebase. The issue might be obsolete or need updating to match current standards and practices. If the issue is out of date, please close. Otherwise please leave a comment to justify its continuing existence. It may be closed in the future if no further activity is noted.

breznak commented 8 years ago

Still valid, could be useful reference for #948

.. The sparse matrix libraries are very custom to our needs and extremely efficient. We looked at the standard libraries before and they did not have sufficiently good sparse support for the operations we need at the time. Perhaps that has changed, but I am doubtful and will need solid proof. I am happy to be proven wrong, as it would get rid of a lot of code as mentioned in the comments.

@scottpurdy @subutai could you recall what the reviewed libraries and special operations were? HTM is in the end doing the same matrix ops as other NNs

subutai commented 8 years ago

@scottpurdy @subutai could you recall what the reviewed libraries and special operations were? HTM is in the end doing the same matrix ops as other NNs

Note that one of our requirements is that the C++ library has good python bindings as well.

I don't remember in detail the exact operations. I think a good goal is to see if spatial_pooler.py and KNNClassifier.py (both use our home-grown sparse C++ libraries) could be reimplemented with some other library.

breznak commented 8 years ago

Note that one of our requirements is that the C++ library has good python bindings as well.

Is this really a constraint on the lib? I mean, isn't the py wrapper for our code enough?

rhyolight commented 8 years ago

Same question from me, @subutai. Why do we care about python bindings in a C++ library used inside nupic.core?

subutai commented 8 years ago

Why do we care about python bindings in a C++ library used inside nupic.core?

I'm assuming the end goal is to remove the current sparse libraries in nupic.core and replace them with external C++ libraries. Currently the C++ sparse libraries in nupic.core are used directly in our Python code to implement the spatial pooler, KNN classifier, and elsewhere.

An alternative path is to replace those python uses with a standard Python sparse library, such as scipy (Sagan proposed we do this). If we can do that everywhere without losing significant performance then we don't need python bindings for linear algebra C++ libraries in nupic.core. Python developers will also then be using libraries that are familiar to them instead of some C++ library.

There are several dependencies and interactions here.

numenta / nupic.core-legacy

Choose linear algebra/matrix library implementation for optimized nupic.core #369

Please review this issue