Open breznak opened 9 years ago
CC @subutai @oxtopus @scottpurdy @rhyolight @david-ragazzi :game_die:
Will this decrease the LOC (lines of code) in nupic.core/nupic/math
and maybe nupic/bindings/math
? If so, looks fine to me.
Will this decrease the LOC (lines of code) in nupic.core/nupic/math
it should cut the hell out of it ;) Namely the ASM code we fought with a while ago, and all {Sparse,Dense}Matrix
are candidates for replacement
it should cut the hell out of it ;) Namely the ASM code we fought with a while ago, and all {Sparse,Dense}Matrix are candidates for replacement
Uow! Looks specially atraent to me! :smile:
Speed is the most important criterion here. We need to verify that the new library does indeed give a speed improvement. The sparse matrix libraries are very custom to our needs and extremely efficient. We looked at the standard libraries before and they did not have sufficiently good sparse support for the operations we need at the time.
Perhaps that has changed, but I am doubtful and will need solid proof. I am happy to be proven wrong, as it would get rid of a lot of code as mentioned in the comments.
some replies from the ML:
I think, it is meaningfull especially for very big and sparse matrices. @Thanh-Binh
True. Typically we use <1x2048> vectors
, although eg for vision tasks the matrices can be much bigger. Eigen claims to offer speedups even for small matrices = https://github.com/numenta/nupic.core/issues/371
Have you guys looked into using a bit vector compression lib to maybe formalize the SDRs into formal classes/objects rather than treatment as primitive arrays? @cogmission
I have thought about it - bitvector (but more for python nupic). It would be nice, but I think for nupic.core
speed is paramount. And we are not so much limited by the memory (yet).
So i think the GPGPU/SIMD predominant the mem footprint.
Also, considering the sparse matrices
and the fact SDRs are ~2% sparse, for 1024x1024 =1M we get 50x from sparse, and 8-32x from byte/int -> bit representation.
@subutai
Speed is the most important criterion here. We need to verify that the new library does indeed give a speed improvement.
True that. So we need to identify the most performance critical parts of code (that yet have to be quite simple to re-implement) ..like SparseMatrix, the ASM code in math/TP computations. See https://github.com/numenta/nupic.core/issues/372 and do a lingebra library rundown -- this might be time consuming/hard/costly. See https://github.com/numenta/nupic.core/issues/372
The sparse matrix libraries are very custom to our needs and extremely efficient. We looked at the standard libraries before and they did not have sufficiently good sparse support for the operations we need at the time.
With all the respect, I find this hard to believe (at the current date), as you developed neural net theory&implementation (along with some optimization), but those libraries focus solely on speed (and have much bigger funding). There are some issues nupic still has:
that would translate to direct N-times
speedup.
All in all, let's see (but how?) :smile_cat:
Might I suggest taking a look at Armadillo with OpenBLAS? I've used it before and it was a pleasure to work with. With its use of templates, the code almost looks like Matlab or Fortran. And OpenBLAS approaches Intel MKL in speed. @jshahbazi
definitely! thank you for input. Armadillo x Eigen comparison: http://nghiaho.com/?p=1726 -- Armadillo+openBlas is better Armadillo x Eigen: https://stackoverflow.com/questions/10366054/c-performance-in-eigen-library -- Eigen uses another approach Eigen can use openBLAS: https://forum.kde.org/viewtopic.php?f=74&t=110509 -- Eigen can use BLAS/LAPACK backend, which mitigates the speed issue.
@jshahbazi @subutai I see the library selection will be uneasy, and quite often a matter of taste too. I have defined the requirements (open for modifications), and this also depends a lot on IFF someone is going to tackle this and what he prefers - so maybe we just say:
I'd point out that in addition to proving some benefit before we will give consideration, there are nice benefits of having our own implementation. If we need a highly optimized operation, having our own implementation makes it very easy to add the operation. If we use a third-party library, it is much more difficult.
I personally don't think this is the best use of our time right now but would not discourage anyone that thinks there may be a real benefit and wants to spend their time on it.
personally don't think this is the best use of our time right now but would not discourage anyone that thinks there may be a real benefit and wants to spend their time on it.
I want to run Nupic on tasks for NLP, vision, ... where the current speed is a blocker for the size of data I need to crunch, that's driving my motivation here.
I want to run Nupic on tasks for NLP, vision, ... where the current speed is a blocker for the size of data I need to crunch, that's driving my motivation here.
Well I hope that there are significant speed improvements to realize. It is good to know what your goal is - we can reuse a lot of previous Numenta experience with vision.
(If required) Looks like Eigen can be tested here; http://docs.biicode.com/c++/examples/eigen.html
You all will have to please forgive me for my novice understanding of the code (I'm still learning it... slowly), but I wanted to understand what kinds of calculations are being made within nupic that could require a library like Eigen or Armadillo or MKL or OpenBLAS or whatever. Is there massive matrix multiplication going on? Vector multiplication? Even if someone could just point me to proper class/function/file so I could get a better handle on it, I think I could offer up some help with this.
This issue needs to be reviewed by the original author or another contributor for applicability to the current codebase. The issue might be obsolete or need updating to match current standards and practices. If the issue is out of date, please close. Otherwise please leave a comment to justify its continuing existence. It may be closed in the future if no further activity is noted.
Still valid, could be useful reference for #948
.. The sparse matrix libraries are very custom to our needs and extremely efficient. We looked at the standard libraries before and they did not have sufficiently good sparse support for the operations we need at the time. Perhaps that has changed, but I am doubtful and will need solid proof. I am happy to be proven wrong, as it would get rid of a lot of code as mentioned in the comments.
@scottpurdy @subutai could you recall what the reviewed libraries and special operations were? HTM is in the end doing the same matrix ops as other NNs
@scottpurdy @subutai could you recall what the reviewed libraries and special operations were? HTM is in the end doing the same matrix ops as other NNs
Note that one of our requirements is that the C++ library has good python bindings as well.
I don't remember in detail the exact operations. I think a good goal is to see if spatial_pooler.py and KNNClassifier.py (both use our home-grown sparse C++ libraries) could be reimplemented with some other library.
Note that one of our requirements is that the C++ library has good python bindings as well.
Is this really a constraint on the lib? I mean, isn't the py wrapper for our code enough?
Same question from me, @subutai. Why do we care about python bindings in a C++ library used inside nupic.core?
Why do we care about python bindings in a C++ library used inside nupic.core?
I'm assuming the end goal is to remove the current sparse libraries in nupic.core and replace them with external C++ libraries. Currently the C++ sparse libraries in nupic.core are used directly in our Python code to implement the spatial pooler, KNN classifier, and elsewhere.
An alternative path is to replace those python uses with a standard Python sparse library, such as scipy (Sagan proposed we do this). If we can do that everywhere without losing significant performance then we don't need python bindings for linear algebra C++ libraries in nupic.core. Python developers will also then be using libraries that are familiar to them instead of some C++ library.
There are several dependencies and interactions here.
requirements copied from parent #28
This is important as all other tasks on #28 depend on this implementation; once we settled we should stick with it for a while :)
TODO
Lingebra libraries review:
Candidates:
foreword: I've never programmed using lingebra lib in c++ but I spent quite some time doing my homework in this research, Eigen showed most often and in good light.