htm-community / htm.core

Actively developed Hierarchical Temporal Memory (HTM) community fork (continuation) of NuPIC. Implementation for C++ and Python
http://numenta.org
GNU Affero General Public License v3.0
151 stars 75 forks source link

SDR Reorganization #280

Closed ctrl-z-9000-times closed 5 years ago

ctrl-z-9000-times commented 5 years ago

The SDR class has grown. List of SDR classes & tools:

Potential Future SDR classes:

I'd like to:

dkeeney commented 5 years ago

Those are all good things.
But I think you need to be careful that you don't let it get too complicated. It could become like the SparseMatrix set of files where it is so complicated nobody wants to fix it. So keep it simple enough that anybody can understand it. :)

breznak commented 5 years ago

But I think you need to be careful that you don't let it get too complicated. It could become like the SparseMatrix

I'm :100: with this, so please stress that. The SDR is good, but careful growing the "may be handful" utilities. ArrayBase/ArrayAlgo also had convertors for all possible types.

Is/would it still be possible to eventually change SDR's internal data representation type to something else, say eigen::sparse_matrix?

:+1: for the sdr:: namespace,

Then drop the SDR_ prefix from all SDR classes.

the names might become confusing (Overlap, Reshape,...) so I'd say keep the SDR in the names as well (?)

With that, I'm :+1: :+1: for the changes.

ctrl-z-9000-times commented 5 years ago

Is/would it still be possible to eventually change SDR's internal data representation type to something else, say eigen::sparse_matrix?

Yes, but it would be a lot if work and it would mean getting rid of std::vector.

ctrl-z-9000-times commented 5 years ago

the names might become confusing (Overlap, Reshape,...) so I'd say keep the SDR in the names as well (?)

It would look like: SDR::Overlap and SDR::Reshape

breznak commented 5 years ago

Yes, but it would be a lot if work and it would mean getting rid of std::vector.

we should try to keep it that way, That was the idea for omnipresent SDR_t, that you can easily experiment with (compatible) containers for data types (array[], vector, vector, bitarray, sparse array implementations, ...)

It is then question if we should continue using SDR as the type put in the algorithms (as in compute(SDR input, SDR activeOutput)), and rather start using SDR_flatSparse_t ?

ctrl-z-9000-times commented 5 years ago

Compute uses both sparse and dense representations.

dkeeney commented 5 years ago

I have a question. As you know I am starting work on the TMRegion again and since SDR's are not being used in the algorithm I was going to continue using the Real32 type buffers. But the outputs are in sparse format. I prematurely removed the Sparse to Dense capability from Array.

Is there an existing Sparse to Dense (Real32) function someplace or should I just write one specifically for this case that can be thrown away when we add SDR's to TemporalMemory.cpp ?

breznak commented 5 years ago

Is there an existing Sparse to Dense (Real32) function someplace or should I just write one specifically for this case that can be thrown away when we add SDR's to TemporalMemory.cpp ?

you can use VectorHelpers::sparseToDense(), and otherwise.

Or if the data is binary, you can convert it to a SDR and use its getSparse(), getDense().

dkeeney commented 5 years ago

you can use VectorHelpers::sparseToDense(), and otherwise.

Yes, that is what I was looking for. Using the SDR to do the conversion would require also converting from Byte to Real32.

Thanks.

ctrl-z-9000-times commented 5 years ago

Well the sdr can do the conversions, but it will return UInt or Byte, not Real. You could also write new TM method overloads which send/receive SDRs; some shims to let the TM interface with the networkAPI.

On Tue, Feb 26, 2019, 12:43 PM David Keeney <notifications@github.com wrote:

I have a question. As you know I am starting work on the TMRegion again and since SDR's are not being used in the algorithm I was going to continue using the Real32 type buffers. But the outputs are in sparse format. I prematurely removed the Sparse to Dense capability from Array.

Is there an existing Sparse to Dense (Real32) function someplace or should I just write one specifically for this case that can be thrown away when we add SDR's to TemporalMemory.cpp ?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/htm-community/nupic.cpp/issues/280#issuecomment-467538433, or mute the thread https://github.com/notifications/unsubscribe-auth/ASq_q5u7PyL86dSVZPpx2Z9M-JZbgAc2ks5vRXGtgaJpZM4bSMZE .

dkeeney commented 5 years ago

You could also write new TM method overloads which send/receive SDRs;

I was hoping that someone had already written the overloads in TM so I could just call them :)

ctrl-z-9000-times commented 5 years ago

was hoping that someone had already written the overloads

Ok. I will prioritize TM-SDR integration over this reorganizing.

ctrl-z-9000-times commented 5 years ago

I had an idea for more descriptive names for the SDR data formats:

It's a bit late to be changing the names, but still possible... What do you all think?

dkeeney commented 5 years ago

Actually I like those names. They are more consistent with what they are.

No, it is not too late to change names. Once this library and the SDR class start being used by other people then it becomes really hard to change names.

breznak commented 5 years ago

+1 to change to sparse and coordinate

-- Marek Otahal :o)

breznak commented 5 years ago

Make a C++ header which includes the entire SDR namespace, for user convenience

can you expose the SDR manipulation methods in the SDR.hpp the same way you can manipulate string? Eg. for Reshape/Proxy adding

SDR_Reshape& sdr::SDR::reshape(const vector<> dimensions); 
ctrl-z-9000-times commented 5 years ago

can you expose the SDR manipulation methods in the SDR.hpp

That's a good idea! I don't know about the implementation details but I'm sure its doable.

ctrl-z-9000-times commented 5 years ago

Hi @breznak & @dkeeney,

I think that the SDR class is close enough to done. Software is never 100% done, but this piece of it is polished enough that I'd like to start talking about it on the Numenta forum. Unless of course either of you have change requests which break the SDR API, or serious usability issues? It is important to talk about improvements & new features, otherwise other people won't know to use them. Here is what I'd like to post:


Hello HTM-Hackers,

I'd like to take the time to introduce to you a new feature which I've added to the community fork of Nupic. Although the majority of of the library is still under development, this piece is ready for the spotlight.

Sparse Distributed Representations (SDRs) are a critical component of all HTM models, and are used throughout Nupic. Numenta has done significant mathematical analysis of SDRs and their properties.

I made a new class SDR for working with Sparse Distributed Representations. The SDR class has methods to hold, measure, and manipulate them. Key features include:

For more information see: https://github.com/htm-community/nupic.cpp/wiki/Sparse-Distributed-Representations

Questions and comments are welcome.

breznak commented 5 years ago

I think that the SDR class is close enough to done. ... I'd like to start talking about it on the Numenta forum.

:+1: I think SDR is just fine to show off

Unless of course either of you have change requests which break the SDR API,

I'd prefer to hesitate to set some API for this for now, let people use it and we might get usability feedback. For some API stabilization, we can either leave it open, or add it to Release v1.0 milestone.

Great initiative, go ahead from me! :+1:

dkeeney commented 5 years ago

I think this looks good too. You have a go ahead from me also 👍

ctrl-z-9000-times commented 5 years ago

Thanks guys! I've gone ahead and posted it.

breznak commented 5 years ago

Is there a reason why SDR Concatenation, Intersection are essentially "views" on a set of SDRs which they act upon (concat, intersect)?

The implementation would be very simple

void SDR::concatenate(const SDR& other) {
  NTA_ASSERT(this->dimensions == other.dimensions);
  std::set merged(this.getSparse());
  merged.insert(other.getSparse());
  this.setSparse(merged);
}
ctrl-z-9000-times commented 5 years ago

API Design consideration: The sdr::Concatenation has different dimensions than its inputs.


Is there a reason why SDR Concatenation, Intersection are essentially "views" on a set of SDRs which they act upon (concat, intersect)?

My vision was that at the start of the users program they would setup a processing pipeline and then they would be able to push data through their pipeline without needing to specify every operation in between. This works really well for the Metrics which automatically track a target SDR.

For concat & intersection this design makes it impossible to assign in-place to an SDR so yes we can fix that. For API I recommend:

SDR A, B, Intersect;
Intersect.intersection( A, B );
// Inputs A, B are unchanged,
// Result is in Intersect
SDR A, B, Concat;
Concat.concatenate( A, B );
// Inputs A, B are unchanged,
// Result is in Concat
// Split is the inverse of concatenate.
SDR A, B, Concat;
Concat.split( A, B );
// Input Concat is unchanged,
// Results are in A, B
breznak commented 5 years ago

[would setup a processing pipeline ... push data through ... without needing to specify every operation in between.] This works really well for the Metrics which automatically track a target SDR.

I agree this approach is very nice for complex tasks, like Metrics. On the other hand, for relatively simple tasks (as all of concat, intersect, union, split) I think it's an overkill and even less intuitive to use. Compare current

int a, b;
Product prod(a, b);

to

int a,b, prod;
prod = a * b;

My points are:

ctrl-z-9000-times commented 5 years ago

Make a C++ header which includes the entire SDR namespace, for user convenience

This is the next thing id like to work on. Id like to

breznak commented 5 years ago

This is the next thing id like to work on. Id like to

:+1: although with the cleanup done, I think even now it's ok. Would be nice if you can make the union, I'll use it for anomaly instead of VectorHelpers

ctrl-z-9000-times commented 5 years ago

I really dont think you need the union to do anomaly.

On Thu, Apr 11, 2019, 5:50 PM breznak notifications@github.com wrote:

This is the next thing id like to work on. Id like to

👍 although with the cleanup done, I think even now it's ok. Would be nice if you can make the union, I'll use it for anomaly instead of VectorHelpers

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/htm-community/nupic.cpp/issues/280#issuecomment-482334646, or mute the thread https://github.com/notifications/unsubscribe-auth/ASq_q4o557F698LH-pe5KFygTBzJcgNGks5vf64wgaJpZM4bSMZE .

ctrl-z-9000-times commented 5 years ago

I'm going to close this issue. Also change the issue name back to "SDR Reorganization"

There are outstanding issues left on this page. All of them are either tracked in their own issue or won't be fixed.