Handshake-runner should allow unbounded FIFOs

llvm / circt

Circuit IR Compilers and Tools

https://circt.org

Other

1.69k stars 303 forks source link

Handshake-runner should allow unbounded FIFOs #67

Open stephenneuendorffer opened 4 years ago

stephenneuendorffer commented 4 years ago

The current implementation of the handshake runner implicitly has one buffer place per operation. This is sufficient for models that are generated from the standard-to-handshake conversion, but may not be sufficient for more general models. Probably the right path forward is to implement a multi-threaded executable, either in a similar style to the handshake-runner, or through some lowering path (e.g. async dialect -> LLVM + coroutines -> binary) with dynamically growing fifos. Note that a bound can still be useful to detect improperly constructed designs.

kumasento commented 4 years ago

Hi @stephenneuendorffer

Thank you for creating this issue!

There are several things that I'm not sure about and it would be great to have your input on them:

Why is it necessary to make the handshake-runner a multi-threaded executable? My intuition is that because the FIFO will be dynamically resized, there should be a thread that manages such resizing tasks in the background to avoid overhead. I'm wondering maybe there are other reasons.
How should we express the unbounded FIFO in the handshake dialect? FIFO seems to be implicitly constructed in designs represented by Handshake. So will the unbounded FIFO be something implicitly lowered from things equivalent to while(true) {...} in the standard dialect?

stephenneuendorffer commented 4 years ago

It's not really necessary, but at a certain point as we scale up to larger designs, faster simulation on multi-core machines would be nice.
Well, there are maybe two different kinds of models. A 'higher-level' model where the connections between processes represent unbounded fifos, and a 'lower-level' model where fifos of bounded size are explicit in the model (essentially they are special kinds of processes) and the connections between processes have no implicit buffering. The simulator today actually implements something in between where an implicit fifo of size one exists on every connection. These differences are subtle, but at a certain point we want to make sure that we model hardware accurately in order to detect that deadlock conditions in the model aren't implemented in hardware.

kumasento commented 4 years ago

Thank you Steve for explaining these 👍

Well, there are maybe two different kinds of models. A 'higher-level' model where the connections between processes represent unbounded fifos, and a 'lower-level' model where fifos of bounded size are explicit in the model (essentially they are special kinds of processes) and the connections between processes have no implicit buffering.

I'm just wondering whether it is a better idea to enable Handshake to explicitly express FIFOs first before taking care of unbounded FIFOs? Then the differences between high- and low-level models will be the absence/existence of FIFO operations. This might make things a bit easier to understand, and may make future deadlock detection simpler to implement.

Otherwise, if we don't want to introduce explicit FIFO operations, what I'm intending to do for this issue will be changing all the implicit size-one FIFO constructions to some other dynamically-sized FIFO instantiation. Does this sound good to you? :)

stephenneuendorffer commented 4 years ago

I think adding explicit FIFO operations into the handshake dialect is a good first step. I think @hanchenye has done some of this but might not have it upstreamed yet. Making the implicit fifo size configurable in the handshake runner is also a good idea.