Provide effective way to query presynaptic cells from Connections

breznak commented 5 years ago

Newly added code from #609 introduces python htm.advanced.algorithms.Connections which is a wrapper to bindings' Connections with a few extra methods.

[ ] review and port useful methods to C++ Connections (and bindings)
- [ ] and optionally remove the htm.advanced Connections if no longer needed
[ ] for speedup, an effective way to query presynaptic cells from Connections is needed: Currently done as (slow):

The bottleneck is in listing all the presynaptic cells of a segment which I implemented as presynamptic_cells = np.array([permanences.presynapticCellForSynapse(synapse) for synapse in permanences.synapsesForSegment(segment)] This is called many, many times and hence takes 25% of the run time. ~fcr

fcr commented 5 years ago

If you are in the middle of this I will also need an implementation of SparseMatrixConnections.growSynapses(). Until then, I am adding it to htm.advanced.algorithms.Connections. You will probably get a PR connected (pun intended) to this soon.

breznak commented 5 years ago

will also need an implementation of SparseMatrixConnections.growSynapses()

this will need some sensible work. SparseMatrix* is a no-no, we got rid of that backend and rewrote all our codebase to use Connections. So for growSynapses - I think we do have such method (?), or if not: ok if we can implement it using the existing Connections.

Overall, with this porting work, I'd like to stress more focus on as lean as possible code. The front-facing algorithms (as ApicalTM, CoordEncoder, ...) are fine, but the cruft that comes with it (frameworks, support, ...) should be cut to bare minimum.

fcr commented 5 years ago

I am porting the latest thalamus code from HTMResearch and this uses growSynapses. For the moment I have implemented it in Python as I did for growSynapsesToSample. If you don't think the thalamus is a generally useful contribution, I will keep it private. I need it. BTW I don't know if you noticed, but I only ported support code that was absolutely needed.

breznak commented 5 years ago

If you don't think the thalamus is a generally useful contribution, I will keep it private. I need it.

it is, and I'll be happy for your contribution.

For the moment I have implemented it in Python as I did for growSynapsesToSample.

..all I wanted to say is it should not use SparseMatrix, rather a re-implementation using connections. If done similarily as the prev. PR, all is :+1:

breznak commented 5 years ago

porting the latest thalamus code from HTMResearch

could you make us a wrap-up of what's new in HTM.research and what is ported here?

Btw, I'm curious what your plans with thalamus+gridcells are? :)

fcr commented 5 years ago

could you make us a wrap-up of what's new in HTM.research and what is ported here?

HTMResearch is huge and covers most of the research over the years until Numenta switched to nupic.research. The last projects there are the start of their work with sparse inputs for deep neural nets. Before that, are the location framework followed by the thalamus. These are the last and the pinnacle of their actual HTM research. Essentially I have ported the location framework and all its support code. All the regions are part of this. The thalamus is on the way.

Btw, I'm curious what your plans with thalamus+gridcells are? :)

I am building a computerized vision system using HTM . Unfortunately I cannot provide details since this work is proprietary. :-(

breznak commented 5 years ago

computerized vision system using HTM . Unfortunately I cannot provide details since this work is proprietary. :-(

cool, I've done some early steps in that direction too. We'll all benefit if you're building your solution on htm.core, so I'll try to support any changes you need for the development.

fcr commented 5 years ago

Thanks. I appreciate the support.

breznak commented 5 years ago

See #575 for growSynapses discussion

breznak commented 4 years ago

@fcr I cannot find it now, but you've written somewhere that the bottleneck in advanced.connections is mainly in one method. Is it growSynapses? Could you please comment (maybe here) on https://github.com/htm-community/htm.core/issues/575#issuecomment-533249619

it seems to me that TM.growSynapses_() could be used here, and moved to Connections.
I wonder if the change will actually make a performance difference(?) As your py implementation is quite nicely written. A significant diff could hint our bindings do copy some memory somewhere instead of sharing the buffers.

fcr commented 4 years ago

@breznak The comment is the iniitial post of #724. The bottleneck is in listing all the presynaptic cells of a segment which I implemented as presynamptic_cells = np.array([permanences.presynapticCellForSynapse(synapse) for synapse in permanences.synapsesForSegment(segment)] This is called many, many times and hence takes 25% of the run time. Any improvement of this will greatly help. (Together with pickle of networks so that multiprocessing can also chip in - hint hint :-) )

breznak commented 4 years ago

The bottleneck is in listing all the presynaptic cells of a segment which I implemented as presynamptic_cells = np.array([permanences.presynapticCellForSynapse(synapse) for synapse in permanences.synapsesForSegment(segment)] This is called many, many times and hence takes 25% of the run time. Any improvement of this will greatly help.

thanks, I'll look more. But I'm sceptical, seems those methods are already in Connections, you're simply calling expensive methods manytime (ie rewrite from py to c++ won't give much speedup?).

This is called many, many times and hence takes 25% of the run time

guess you cannot reuse the data, as it changes ..

fcr commented 4 years ago

seems those methods are already in Connections, you're simply calling expensive methods >manytime (ie rewrite from py to c++ won't give much speedup?).

I figured as much, but was hoping that you could implement internally in a better way. Getting the presynaptic cells is widely used in all the advanced (i.e. htm_research) stuff.

guess you cannot reuse the data, as it changes ..

Good guess :-)

breznak commented 4 years ago

could implement internally in a better way. Getting the presynaptic cells is widely used in all the advanced (i.e. htm_research) stuff.

ok, that's what I wanted to figure. So just porting to cpp won't be good enough, so we need to find a good way (or change connections to work so) to get presyn cells easily. (the title then was misleading)

fcr commented 4 years ago

so we need to find a good way (or change connections to work so) to get presyn cells easily.

That would be ideal. Any new learning algorithm will need to access the presynamptic cells.

breznak commented 4 years ago

Any new learning algorithm will need to access the presynamptic cells.

OT: if that'll be still effective, we're getting to a more biologically accurate representation (each cell runs autonomously/async, and handles its input field), I've needed similar for parallelizing the computations.

fcr commented 4 years ago

OT: if that'll be still effective, we're getting to a more biologically accurate representation (each cell runs autonomously/async, and handles its input field), I've needed similar for parallelizing the computations.

I wonder if the communication overhead would wipe out any advantage of having a separate process for each cell (unless this is happening on a dedicated chip). Since we are discussing this, I am trying to parallelize the system by running each cortical column in its own process, but have been stymied by the lack of pickle support for region and network.

breznak commented 4 years ago

I wonder if the communication overhead would wipe out any advantage of having a separate process for each cell (unless this is happening on a dedicated chip).

not for cells, but seems to be what happened in my parallel PR attempt #559

Since we are discussing this, I am trying to parallelize the system by running each cortical column in its own process, but have been stymied by the lack of pickle support for region and network.

column? (imho too much threads, see above), or a layer (that would make sense), I'd like this to be enabled generally #255 where each "main class" (SP,TM,CP,..) could run in its thread.

Btw, is #253 parallel NetworkAPI your thing? (let's further continue the discussion about parallelization in one of its issues)

breznak commented 4 years ago

@fcr do you still need this functionality? Do you have some updates on what/how should be done from your experience?

fcr commented 4 years ago

@breznak I need this functionality to implement growSynapses and growSynapsesToSample which was originally provided by nupic.core. If there was a direct port of growSynapses and growSynapsesToSample from nupic.core.src.nupic.math.SparseMatrixConnections then I would not explicitly need the presynaptic cells query functionality. growSynapses and growSynapsesToSample are very, very heavily used.

breznak commented 4 years ago

If there was a direct port of growSynapses and growSynapsesToSample from nupic.core.src.nupic.math.SparseMatrixConnections then I would not explicitly need the presynaptic cells query functionality.

sounds this might be the proper solution (?) Could you please link me the sources of the 2 methods?

fcr commented 4 years ago

"sounds this might be the proper solution" - most definitely. The nupic.core original code is in https://github.com/numenta/nupic.core/blob/master/src/nupic/math/SparseMatrixConnections.cpp My Python 3 port of these methods is in: https://github.com/htm-community/htm.core/blob/master/py/htm/advanced/algorithms/connections.py. Is that what you are asking?

breznak commented 4 years ago

Is that what you are asking?

yes, I wasn't sure where the growSynapses() comes form: https://github.com/htm-community/htm.core/blob/master/py/htm/advanced/algorithms/connections.py#L99

htm-community / htm.core

Provide effective way to query presynaptic cells from Connections #668