lava-nc / lava

A Software Framework for Neuromorphic Computing
https://lava-nc.org
Other
529 stars 136 forks source link

Plan and Debug CLP Scaling Issues in NcProcCompiler #822

Open joyeshmishra opened 5 months ago

joyeshmishra commented 5 months ago

As provided by Daniel:

Here are the 3 major issues I've uncovered related to multiple DA and/or learning that impede scaling CLP:

  1. For neurons with multiple DAs, the incoming Connection Proc type (Dense/Sparse/Learning/Conv…although I think Conv is altogether not supported correctly rn for multiDA…) to each of the DAs must be the same or else the indexing between the out axons and associated DAs is incorrect. In my case, this means a ucode Neuron Proc receiving 1 Learning Conn input must receive all Learning Conn inputs
  2. weights.get() does not work as expected for Learning Conns as the number of connections scales. I believe things break when the size of the learning conn scales beyond that can be packed in 1 SynMem word. (This is at least true in my final use case when a NeuroProc group contains multiple DAs, but I believe it’s true for even simpler Learning Conn architectures.) Whatever indexing weights.get() is using to get the appropriate register values from SynMem does not seem to be valid for LearningConns.
  3. My CLP network, which contains 2 feedback loops as well as branching and joining structures, does not map to the same cores deterministically, even when an identical network architecture is compiled repeatedly on the same board. The output NeuroProc group maps to core 0 consistently, but the internal NeuroProc groups involved in the feedback loops get mapped to cores inconsistently during different iterations of compilation. Consequently, I cannot set up hardware Probes to monitor learning on the correct cores before runtime. Moreover, any pre-established Probe callbacks will fail and then prevent network execution if the network core mapping changes during compilation, because the probed SynMap, etc indices may then be out of bounds for the monitored core.

Discussing more with her so that we can get the test scripts to replicate and debug them.

joyeshmishra commented 5 months ago