nengo / nengo-loihi

Run Nengo models on Intel's Loihi chip
https://www.nengo.ai/nengo-loihi/
Other
35 stars 12 forks source link

Add location of Block to validation errors #279

Closed hunse closed 4 years ago

hunse commented 4 years ago

So that it's clearer to the user what block is causing the problem.

This also fixes a number of other problems:

hunse commented 4 years ago

Ok, I pushed a commit with lots of documentation about fitting models onto Loihi. A lot of it came from the blog post I've written for the release, then I realized it makes sense to have it in the docs, too (or rather, instead; I think the blog post will be a lot shorter now since we can just refer here for the details).

I think 237b3fc can just be completely removed now (maybe we want to keep it somewhere, though, for if/when we do the refactoring that would let us raise such a warning in the splitter).

I also added a number of smaller fixes.

drasmuss commented 4 years ago

I fixed what I think is a bug in the test, where conn_args was being ignored (so when we were setting synapse=None down below it wasn't actually doing anything) https://github.com/nengo/nengo-loihi/pull/279/commits/3cd982347b0d4d1b9348870ba934e2e8109f7305#diff-2b5472730de7e72d3c4c167f986643d2R617. But now the test is failing tolerances https://travis-ci.com/github/nengo/nengo-loihi/jobs/313727540#L1887. I'm not sure what the correct solution is: do we want synapse=None (in which case we need to figure out why the test is failing)? Or do we want the accidental behaviour we had before (just using the default synapse)?

hunse commented 4 years ago

The failure is on comparing that the hardware matches the emulator exactly. That's always been the case in the past, but it appears that here (for the specific case of pop16 axons with precompute=False), that's not the case.

I'm not quite sure why having synapse=None on the input makes a difference. Since the input is just a constant input, all that would do is control how quickly the input ramps up. So somehow that's throwing of some sort of timing in the inputs. My instinct was just to raise snip_max_spikes_per_step, but that doesn't seem to be it (also, we would see that in the pop32 case as well). So it could be a spike is getting dropped or something just based on some weird timing in the input.

I don't know whether it's better to loosen the tolerances and keep the synapse=None, or to change the synapse back and keep the tolerances. I don't think it makes a big difference either way. I would probably opt to change the tolerances and make a note in the code that these tolerances are looser to accommodate that particular problem. At some point, it would be good to look into it further, but I have a feeling that this is the kind of thing that will be hard to reproduce in a minimal example.

hunse commented 4 years ago

Ok, I've made that change. Also, I realized that if you try to run test_conv_deepnet with the BOARD environment variable set, it appears as a hang. I tried to avoid that in the test, but it wasn't working properly, so just something to keep in mind. I did add a commit to make sure that we set the PARTITION back to it's original value, rather than to the empty string, in case it had been specified before (i.e. if the user had already set the partition to "nahuku32" to use that for all tests, we don't want to mess things up for later tests).