Open subutai opened 10 years ago
is this still an issue? (given the optimizations were not that big?) Before I was suggesting multiplatform eigen
library, but not sure if we should bother at this time.
relevant: #193 #151
@subutai would you mind if I reword the issue a bit? former description:
subutai commented on Feb 20, 2014
See issue #27. We'd like to possibly add it back in later so tracking it here. Some related web pages:
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man7/vecLib.7.html
Before adding it back in we should verify this really gives a performance improvement in real cases. This is doubtful.
When optimizing some critical parts of C++ code, this is pretty neat tool! : http://gcc.godbolt.org/#{%22version%22%3A3%2C%22filterAsm%22%3A{%22labels%22%3Atrue%2C%22directives%22%3Atrue%2C%22commentOnly%22%3Atrue}%2C%22compilers%22%3A[{%22sourcez%22%3A%22C4TwDgpgJhBmAEUD2BXARgGwvAbhAxsEgE7wD6ZAhsMMQJZorAQXwAUbehJZAznQC8IbAMwAmAJRSA3AChk6LIiTA2%2BJADtewXASLEAZPEoAadVp1d9RtBNkBvWee27upfPAC8xgFRo5xBDAKMQa7PgA2gAMALoA1JEAjDEScWoRYvGRIilyAL5AAAA%3D%22%2C%22compiler%22%3A%22g530%22%2C%22options%22%3A%22-Os%20-mavx%22}]}
FYI @oxtopus
This issue needs to be reviewed by the original author or another contributor for applicability to the current codebase. The issue might be obsolete or need updating to match current standards and practices. If the issue is out of date, please close. Otherwise please leave a comment to justify its continuing existence. It may be closed in the future if no further activity is noted.
This is still valid, although noone is at the time working on the porting to lingebra libraries. I think it should stay open to monitor optimization progress and results. E.g. the PRs from @mrcslws speeding up TM could be referenced here for record.
Ok, so the issue is still valid, but it is also defined very broadly. It's labeled type:optimization
so I'll track it that way, but I think the ticket description needs to be simplified. It's too long and complicated, and too many subjects and TODO items. We need to try to keep our issues simpler and smaller. This could turn into a super issue, but honestly I would rather break it up even farther. Something to think about @subutai.
@rhyolight Agreed. The issue is indeed pretty big right now. I think a good first step is to replace the use of sparse matrices in the python spatial pooler, python KNN classifier, and/or optimize the existing C++ SpatialPooler (which is not currently too optimized).
I think a good first step is to replace the use of sparse matrices in the python spatial pooler, python KNN classifier, and/or optimize the existing C++ SpatialPooler (which is not currently too optimized).
@subutai shouldn't the effort focus on the big-impact first? Aka the biggest bottlenecks, which is still TM/TP?
You all will have to please forgive me for my novice understanding of the code (I'm still learning it... slowly), but I wanted to understand what kinds of calculations are being made within nupic that could require a library like Eigen or Armadillo or MKL or OpenBLAS or whatever. Is there massive matrix multiplication going on? Vector multiplication? Even if someone could just point me to proper class/function/file so I could get a better handle on it, I think I could offer up some help with this.
@jshahbazi Sorry, I missed your call, if you are still interested, we certainly would! The logic and operations are in algrithms/Connestions.hpp
(for TemporalMemory) and in math/{Sparse,Dense}Matrix
(for SpatialPooler).
The operations (someone please correct me): vector AND, searching N-highest entries, indexing and updating weights, ... @scottpurdy @mrcslws @subutai ?
The code can be benchmarked (globally, for a typical use) using #890 . Also please weight in on #948
shouldn't the effort focus on the big-impact first? Aka the biggest bottlenecks, which is still TM/TP?
The TM is actually not the biggest bottleneck right now. After changes by @mrcslws it is a pretty small part of the overall profile.
The TM is actually not the biggest bottleneck right now. ...
@subutai not really, it still is (even the code complexity compared to SP is higher)
Please see https://github.com/numenta/nupic/pull/3131 for my benchmarks:
0.040 s/call
0.040 s/call
0.158 s/call
The old SP problem I've discovered with 1D vs 2D inputs: https://github.com/numenta/nupic.core/issues/380 Problem with TM speed: https://github.com/numenta/nupic.core/pull/890#issuecomment-219260326
We need to try to keep our issues simpler and smaller. This could turn into a super issue, but honestly I would rather break it up even farther
@rhyolight this IS a super issue with links to sub-issues where possible/active
Added https://github.com/numenta/nupic.core/issues/967 as a proposal that would halve the computation time easily.
not really, it still is (even the code complexity compared to SP is higher)
@breznak I will let @mrcslws comment on this. According to Marcus, when you run hotgym, the new TM is a small percentage of the overall profile. Marcus - am I mis-remembering?
I took a quick look at #3131 and sp_profile. I don't remember seeing this script before but it looks like the SP parameters are quite off in sp_profile. Why is potentialRadius only 3? It should be much larger to form good SDRs. Same with numActiveColumnsPerInhArea, etc. etc. I think the parameters should be set to realistic numbers and the profile re-run with those numbers.
This super issue plans workflow for speed optimizations by using a specialized library.
Benefits:
Requirements:
Workflow: