Fix matmul acc delay with lock

This change fixes a bug where matmuls and accs would wait on each other instead of proceeding in parallel. The problem was that the control queues feeding them were implicitly linked, meaning that one could not get too far ahead of the other. We add buffers in these places to cut dependencies between buses fed by the same control inputs. In this case the buffer size was sometimes set to 2x the array size to account for the fact that accumulate operations can't begin until data has propagated through the array.

To prevent concurrency bugs from arising due to out of order execution, we also add a lock pool module that manages a set of mutexes that are shared by a number of actors. In our case, the mutexes are locks tied to memory regions (2 locks total) and the actors are the memory ports. The decoder acquires locks when it sends off memory requests, and the locks are automatically released when the appropriate condition is met - in this case when the memory request that needs to lock is actually passed into the memory module.

This change also required upgrading to Chisel 3.5.3 to get around some odd bugs that seemed to be due to the cloneType methods, which are no longer necessary in this version. To effect the upgrade, we had to fix some deprecation and naming issues, such as removing unnecessary parentheses from .fire, .isLit, or moving the verification statements out of the experimental package. We also had to update tests to reflect the new API of scalatest in the latest version (cosmetic differences only).

The meat of the change is in the LockPool module so please start there when reviewing.

Closing #52 in favour of this new PR

Closes #15

	ResNet20 CIFAR ONNX	YOLOv4 Tiny @416 ONNX	ResNet50 ImageNet ONNX
A. ZCU104 baseline (ms)	13.099	294.071	514.279
B. A and external AXI4S Width Converter instead of Transmisson	11.168	262.804	473.956
C. B and two-cycle read/write fix	9.661	214.892	415.953
D. C and fix for delay between ops	7.387	120.748	256.987

tensil-ai / tensil

Fix matmul acc delay with lock #55