Add location of Block to validation errors

hunse commented 4 years ago

So that it's clearer to the user what block is causing the problem.

This also fixes a number of other problems:

We were overcounting output axons in the splitter, because we neglected to account for axon sharing with population axons going to the same target.
Added a warning if we make copies of the weights to fix indexing issues, so that it's clear to the user if they fix these indexing issues, they could reduce the number of weights needed.
Do not change the requested block size. Sometimes it is essential (e.g. with pop16 axons) that parts of the requested block size (e.g. number of filters) not be changed. For now, I've just taken it out completely. We could add back in a warning letting the user know that the same number of blocks could be achieved with a smaller shape, and let them change it themselves if they want. We could also add an extra keyword parameter to BlockShape that says whether it can be changed or not (or even whether particular dimensions can be changed, but that gets a bit more complicated).
There's a line if spike is None in LoihiSpikeInput.set_axons that I thought could never be hit, but apparently the CIFAR-10 notebook hits it. In that case we need to make sure the axon gets added with an empty list of targets, so that when the corresponding neuron spikes, we don't get an index error.
I added a couple functions to Model and LoihiBlock to display info about how much of different chip resources a block is using. It does everything as percentages of the maximum per core, so that users don't have to go in and reference these maximums themselves. I found it useful for debugging how close I was to the maximum on different cores, so that I could choose the block sizes to use the cores as well as possible. The one downside is that if anything is over the maximums, this will trigger in validation when the Simulator is being created, so you never actually get to a point that you can use these utilization functions. But with the better error messages for these validation errors also in this PR, hopefully that's OK.

hunse commented 4 years ago

Ok, I pushed a commit with lots of documentation about fitting models onto Loihi. A lot of it came from the blog post I've written for the release, then I realized it makes sense to have it in the docs, too (or rather, instead; I think the blog post will be a lot shorter now since we can just refer here for the details).

I think 237b3fc can just be completely removed now (maybe we want to keep it somewhere, though, for if/when we do the refactoring that would let us raise such a warning in the splitter).

I also added a number of smaller fixes.

drasmuss commented 4 years ago

I fixed what I think is a bug in the test, where conn_args was being ignored (so when we were setting synapse=None down below it wasn't actually doing anything) https://github.com/nengo/nengo-loihi/pull/279/commits/3cd982347b0d4d1b9348870ba934e2e8109f7305#diff-2b5472730de7e72d3c4c167f986643d2R617. But now the test is failing tolerances https://travis-ci.com/github/nengo/nengo-loihi/jobs/313727540#L1887. I'm not sure what the correct solution is: do we want synapse=None (in which case we need to figure out why the test is failing)? Or do we want the accidental behaviour we had before (just using the default synapse)?

hunse commented 4 years ago

The failure is on comparing that the hardware matches the emulator exactly. That's always been the case in the past, but it appears that here (for the specific case of pop16 axons with precompute=False), that's not the case.

I'm not quite sure why having synapse=None on the input makes a difference. Since the input is just a constant input, all that would do is control how quickly the input ramps up. So somehow that's throwing of some sort of timing in the inputs. My instinct was just to raise snip_max_spikes_per_step, but that doesn't seem to be it (also, we would see that in the pop32 case as well). So it could be a spike is getting dropped or something just based on some weird timing in the input.

I don't know whether it's better to loosen the tolerances and keep the synapse=None, or to change the synapse back and keep the tolerances. I don't think it makes a big difference either way. I would probably opt to change the tolerances and make a note in the code that these tolerances are looser to accommodate that particular problem. At some point, it would be good to look into it further, but I have a feeling that this is the kind of thing that will be hard to reproduce in a minimal example.

hunse commented 4 years ago

Ok, I've made that change. Also, I realized that if you try to run test_conv_deepnet with the BOARD environment variable set, it appears as a hang. I tried to avoid that in the test, but it wasn't working properly, so just something to keep in mind. I did add a commit to make sure that we set the PARTITION back to it's original value, rather than to the empty string, in case it had been specified before (i.e. if the user had already set the partition to "nahuku32" to use that for all tests, we don't want to mess things up for later tests).

nengo / nengo-loihi

Add location of Block to validation errors #279