Experimentation with FSM dialect during Calyx lowering

mikeurbach commented 3 years ago

In https://github.com/llvm/circt/pull/1506#pullrequestreview-720563236 I mentioned that it might be useful to use the proposed FSM dialect during Calyx's CompileControl pass. That pass is generating an FSM with Comb logic right now, but my hypothesis is targetting the FSM dialect would provide a more straightforward way to capture the notion of an FSM. Steve was mentioning it would be good for us to do some homework on the FSM proposal and bring our use-cases, so I'll experiment with this a bit.

CC @cgyurgyik

hanchenye commented 3 years ago

Nice. Please let me know if you need any help from the FSM dialect side!

cgyurgyik commented 3 years ago

Agreed about straightforwardness. Definitely intrigued how this will fit in the Calyx picture.

Likewise for Calyx side. I will be on and off until the 16th of August. Rachit and Adrian are likely to respond about any Calyx-related questions as well.

mikeurbach commented 3 years ago

I took a look at this, and I think it went pretty well. The current CIRCT implementation of CompileControl is only working with sequential control (i.e. no parallel constructs, no if/while constructs, etc.). This indeed was straightforward to implement using the new FSM dialect. I'll post some IR examples to discuss.

Here's the test I was looking at:

module  {
  calyx.program  {
    calyx.component @Z(%go: i1, %reset: i1, %clk: i1) -> (%flag: i1, %done: i1) {
      calyx.wires  {
      }
      calyx.control  {
      }
    }
    calyx.component @main(%go: i1, %reset: i1, %clk: i1) -> (%done: i1) {
      %z.go, %z.reset, %z.clk, %z.flag, %z.done = calyx.cell "z" @Z : i1, i1, i1, i1, i1
      calyx.wires  {
        %0 = calyx.undef : i1
        calyx.group @A  {
          %A.go = calyx.group_go %0 : i1
          calyx.assign %z.go = %go, %A.go ? : i1
          calyx.group_done %z.done : i1
        }
        calyx.group @B  {
          %B.go = calyx.group_go %0 : i1
          calyx.group_done %z.done, %z.flag ? : i1
        }
      }
      calyx.control  {
        calyx.seq  {
          calyx.enable @A
          calyx.enable @B
        }
      }
    }
  }
}

For this calyx.control block, I was able to synthesize the following fsm.machine:

  fsm.machine @fsm(%arg0: i1, %arg1: i1) -> i2 attributes {stateType = i2} {
    %enabledGroup = fsm.variable "enabledGroup" {initValue = 0 : i2} : i2
    fsm.state "A" entry  {
      %c0_i2 = hw.constant 0 : i2
      fsm.update %enabledGroup, %c0_i2 : i2
    } exit  {
    } transitions  {
      fsm.transition @B guard  {
        fsm.return %arg0 : i1
      } action  {
      }
    }
    fsm.state "B" entry  {
      %c1_i2 = hw.constant 1 : i2
      fsm.update %enabledGroup, %c1_i2 : i2
    } exit  {
    } transitions  {
      fsm.transition @done guard  {
        fsm.return %arg1 : i1
      } action  {
      }
    }
    fsm.state "done" entry  {
      %c-2_i2 = hw.constant -2 : i2
      fsm.update %enabledGroup, %c-2_i2 : i2
    } exit  {
    } transitions  {
    }
    fsm.output %enabledGroup : i2
  }

I'm curious if this is a good implementation of the Calyx semantics. The machine is intended to:

Store a variable indicating the currently enabled group
Have a state for each group to enable (plus a done state), and update the variable upon entry to a given state
Have an input bit for each state, and transition to the next state when that bit is asserted
Have an output for the enabled group variable

I'm sure there are other approaches, but this seemed like a good way to implement Calyx's group go/done signaling. The output of the FSM can be inspecting to create the group go signals, and the group done signals can be fed into the FSM inputs. I'm curious if there are any other suggestions/feedback on this part.

Given the FSM, the next step was to actually hook it up to the Calyx component. This part took a little more thought, since we are bridging two dialects. There are two mechanisms to instantiate the FSM, but they both have the same form: inputs are passed as operands, and outputs are results.

This doesn't mesh well with Calyx, which takes the FIRRTL approach using SSA regions: instances return both inputs and outputs as results, which are connected to using calyx.assign. There was nowhere for me to instantiate an FSM in the SSA graph that could accept all the necessary inputs and still make the outputs available to use in a calyx.assign below.

To bridge this gap, I decided to wrap the instantiation of the FSM in a simple component:

    calyx.component @fsmWrapper(%go: i1, %enable0: i1, %enable1: i1) -> (%enabledGroup: i2, %done: i1) {
      %fsm = fsm.instance "fsm" @fsm
      calyx.wires  {
        calyx.group @trigger  {
          %trigger.go = calyx.group_go %go : i1
          %0 = fsm.trigger %fsm(%enable0, %enable1) : (i1, i1) -> i2
          calyx.assign %enabledGroup = %0 : i2
          %c-2_i2 = hw.constant -2 : i2
          %1 = comb.icmp eq %0, %c-2_i2 : i2
          calyx.group_done %1 : i1
        }
      }
      calyx.control  {
        calyx.enable @trigger
      }
    }

The component has a single group that goes when the components goes, forwards the inputs/outputs to/from the FSM, and is done when the FSM reaches the done state. I think this is a fairly reasonable way to bridge the two worlds, and I think it makes sense in terms of both dialect's semantics.

Again, I'm curious if there are any feedback/suggestions. Another approach is to use the fsm.hw_instance operation directly, rather than the fsm.instance + fsm.trigger. Since Calyx has this notion of go/done, using the higher-level fsm.instance + fsm.trigger seemed to capture those semantics more directly to me, but I guess it depends on how the FSM dialect is lowered.

Given the above component, I was able to finally create a calyx.cell for the FSM, and connect it to the main component:

    calyx.component @main(%go: i1, %reset: i1, %clk: i1) -> (%done: i1) {
      %fsmWrapper.go, %fsmWrapper.enable0, %fsmWrapper.enable1, %fsmWrapper.enabledGroup, %fsmWrapper.done = calyx.cell "fsmWrapper" @fsmWrapper : i1, i1, i1, i2, i1
      %z.go, %z.reset, %z.clk, %z.flag, %z.done = calyx.cell "z" @Z : i1, i1, i1, i1, i1
      calyx.wires  {
        calyx.assign %fsmWrapper.enable0 = %z.done : i1
        calyx.group @A  {
          %c0_i2 = hw.constant 0 : i2
          %1 = comb.icmp eq %fsmWrapper.enabledGroup, %c0_i2 : i2
          %A.go = calyx.group_go %1 : i1
          calyx.assign %z.go = %go, %A.go ? : i1
          calyx.group_done %z.done : i1
        }
        %0 = comb.and %z.flag, %z.done : i1
        calyx.assign %fsmWrapper.enable1 = %0 : i1
        calyx.group @B  {
          %c1_i2 = hw.constant 1 : i2
          %1 = comb.icmp eq %fsmWrapper.enabledGroup, %c1_i2 : i2
          %B.go = calyx.group_go %1 : i1
          calyx.group_done %z.done, %z.flag ? : i1
        }
        calyx.group @fsmGroup  {
          calyx.assign %fsmWrapper.go = %go : i1
          calyx.group_done %fsmWrapper.done : i1
        }
      }
      calyx.control  {
        calyx.enable @fsmGroup {compiledGroups = [@A, @B]}
      }
    }

Similar to the original CompileControl pass, the control program is replaced with a single enable, that enables a new synthetic group. This group just triggers the FSM's go signal, and waits for the done signal.

The component groups are also wired up similarly to the original CompileControl pass. The calyx.group_go ops are filled in with a flag that checks the currently enabled group output of the FSM. The signals used in the calyx.group_done ops are passed as the flags to transition the FSM to the next state.

There are two notable departures from the original implementation:

The calyx.group_go signals are only predicated on the state of the FSM, rather than the state of the FSM and that group's done signal(s) being low
The calyx.group_done signals are passed directly to the relevant inputs of the FSM, rather than being and-ed with the FSM currently enabling that group

I've been trying to simulate this in my head and I think that both of the above simplifications are valid, and the corresponding logic will be handled by the lowering of the FSM to hardware. Perhaps I'm missing something there, but this kind of seems like the main advantage to this approach: otherwise we have to generate all the same combinational logic as the original implementation and we have to go through the steps of generating the machine and wrapping it.

Here's the full IR after my experimental pass:

module  {
  fsm.machine @fsm(%arg0: i1, %arg1: i1) -> i2 attributes {stateType = i2} {
    %enabledGroup = fsm.variable "enabledGroup" {initValue = 0 : i2} : i2
    fsm.state "A" entry  {
      %c0_i2 = hw.constant 0 : i2
      fsm.update %enabledGroup, %c0_i2 : i2
    } exit  {
    } transitions  {
      fsm.transition @B guard  {
        fsm.return %arg0 : i1
      } action  {
      }
    }
    fsm.state "B" entry  {
      %c1_i2 = hw.constant 1 : i2
      fsm.update %enabledGroup, %c1_i2 : i2
    } exit  {
    } transitions  {
      fsm.transition @done guard  {
        fsm.return %arg1 : i1
      } action  {
      }
    }
    fsm.state "done" entry  {
      %c-2_i2 = hw.constant -2 : i2
      fsm.update %enabledGroup, %c-2_i2 : i2
    } exit  {
    } transitions  {
    }
    fsm.output %enabledGroup : i2
  }
  calyx.program  {
    calyx.component @Z(%go: i1, %reset: i1, %clk: i1) -> (%flag: i1, %done: i1) {
      calyx.wires  {
      }
      calyx.control  {
      }
    }
    calyx.component @fsmWrapper(%go: i1, %enable0: i1, %enable1: i1) -> (%enabledGroup: i2, %done: i1) {
      %fsm = fsm.instance "fsm" @fsm
      calyx.wires  {
        calyx.group @trigger  {
          %trigger.go = calyx.group_go %go : i1
          %0 = fsm.trigger %fsm(%enable0, %enable1) : (i1, i1) -> i2
          calyx.assign %enabledGroup = %0 : i2
          %c-2_i2 = hw.constant -2 : i2
          %1 = comb.icmp eq %0, %c-2_i2 : i2
          calyx.group_done %1 : i1
        }
      }
      calyx.control  {
        calyx.enable @trigger
      }
    }
    calyx.component @main(%go: i1, %reset: i1, %clk: i1) -> (%done: i1) {
      %fsmWrapper.go, %fsmWrapper.enable0, %fsmWrapper.enable1, %fsmWrapper.enabledGroup, %fsmWrapper.done = calyx.cell "fsmWrapper" @fsmWrapper : i1, i1, i1, i2, i1
      %z.go, %z.reset, %z.clk, %z.flag, %z.done = calyx.cell "z" @Z : i1, i1, i1, i1, i1
      calyx.wires  {
        calyx.assign %fsmWrapper.enable0 = %z.done : i1
        calyx.group @A  {
          %c0_i2 = hw.constant 0 : i2
          %1 = comb.icmp eq %fsmWrapper.enabledGroup, %c0_i2 : i2
          %A.go = calyx.group_go %1 : i1
          calyx.assign %z.go = %go, %A.go ? : i1
          calyx.group_done %z.done : i1
        }
        %0 = comb.and %z.flag, %z.done : i1
        calyx.assign %fsmWrapper.enable1 = %0 : i1
        calyx.group @B  {
          %c1_i2 = hw.constant 1 : i2
          %1 = comb.icmp eq %fsmWrapper.enabledGroup, %c1_i2 : i2
          %B.go = calyx.group_go %1 : i1
          calyx.group_done %z.done, %z.flag ? : i1
        }
        calyx.group @fsmGroup  {
          calyx.assign %fsmWrapper.go = %go : i1
          calyx.group_done %fsmWrapper.done : i1
        }
      }
      calyx.control  {
        calyx.enable @fsmGroup {compiledGroups = [@A, @B]}
      }
    }
  }
}

Now I'm looking into two things:

Running the rest of the Calyx lowering (i.e. RemoveGroups), and seeing how it handles the FSM. I gave it a quick shot and it segfaults, so I suppose some invariant has been broken and some variable ended up null when it shouldn't.
Running the FSM to HW lowering, and verifying the logic makes sense in the context of the Calyx dialect. If not, I may need to revisit some of the assumptions above.

Finally, I'll conclude with some of my own feedback, in no particular order:

It would be nice for debugability if the FSM regions were optional in the IR, and weren't printed when empty.
There could be some helpers for building the FSM ops from C++ that would make it easier to fill in the entry, transition, etc. blocks. The region-containing SV ops have a callback-based approach, for example.
The need to wrap the FSM was actually the majority of the SLOC and the least obvious part of the implementaion to understand. Perhaps there is a better way.
For this example (straight-line sequential control), I found the resulting IR to be simpler to reason about in terms of an FSM rather than a bunch of combinational logic. I hope the simplifications I mentioned about hold, or else the IR may actually end up being harder to reason about. On the other hand, for more complex control, I can see how the FSM-based approach could ultimately yield better IR either way.
When checking the Calyx implementation in Rust, I found this CompileControl pass has been replaced! I wonder if it is worth implementing the new version if we are going to re-implement it in terms of the FSM dialect.

It's been fun to try and connect the work of @hanchenye and @cgyurgyik, and I want to thank both of you for the efforts you've already put in. Hopefully this experiment provides some interesting feedback for both of you, and of course I welcome any suggestions.

hanchenye commented 3 years ago

It would be nice for debugability if the FSM regions were optional in the IR, and weren't printed when empty.

There could be some helpers for building the FSM ops from C++ that would make it easier to fill in the entry, transition, etc. blocks. The region-containing SV ops have a callback-based approach, for example.

@mikeurbach Very valuable suggestions, thanks!

The need to wrap the FSM was actually the majority of the SLOC and the least obvious part of the implementaion to understand. Perhaps there is a better way.

Basically, the fsm.hw_instance and fsm.instance+fsm.triggers are trying to comply with the semantics of two different domains, HW and SW, respectively. In HW IRs (such as HW+SV+Comb and FIRRTL), although an mlir::Value is only defined once in the IR, it is actually "driven" by its predecessors continuously during the runtime and can "hold" different values at different moment. However, in the world of SW IRs (such as Standard), we don't have such a semantic -- SW IRs run sequentially.

In our case, we are trying to integrate fsm.machine into both HW and SW IRs. To tackle this problem, I introduced fsm.trigger for SW IR to mimic the semantic of always @([[EVENT]]) with the following IR:

scf.if %[[EVENT]] {
  %[[RESULT]] ... = fsm.trigger %foo(%[[INPUT]] ...)
}

And for now, we don't have an abstraction of "event" and the only implicit event of each fsm.transition is posedge clk. Therefore, if you look at the HW and SW integration test, the FSMToHW pass convert fsm.machine to an hw.module containing one always @(posedge clk) block. In the contrast, the FSMToStandard pass convert fsm.machine to a func representing the behavior of all combinational logics between two posedge clks, and the fsm.trigger is converted to a call to the generated func.

Back to your experiments, I'm not quite familiar with the semantic of Calyx, but if the only problem is fsm.hw_instance consumes SSA values as inputs, I imagine we can introduce an FIRRTL-style instance op, like fsm.firrtl_instance (bad name) and generate all inputs and outputs as results. This should not have conflict with the syntax of fsm.machine and could be handled in the lowering, such as FSMToHW. I'll do some experiments and try this out. But any way, thanks for the experiments and all the feedback, they are super valuable for us to understand where we are!

mikeurbach commented 3 years ago

Thanks for the explanation. It sounds like I should be using fsm.hw_instance in my case, and I will update the experimental branch accordingly.

if the only problem is fsm.hw_instance consumes SSA values as inputs

This is really the only problem, and it's just tedious, not something fundamental. I personally prefer the fsm.hw_instance to having a hypothetical fsm.firrtl_instance, since I think it is easier to analyze the fsm.hw_instance. This point was really about the best way to bridge from the Calyx/FIRRTL style (all ports as results) to the FSM/HW style (input ports as operands). From my perspective, wrapping the fsm.hw_instance with a calyx.component is a good way to do it, and I'll see if I can add some helpers to make that less tedious.

hanchenye commented 3 years ago

I personally prefer the fsm.hw_instance to having a hypothetical fsm.firrtl_instance, since I think it is easier to analyze the fsm.hw_instance. This point was really about the best way to bridge from the Calyx/FIRRTL style (all ports as results) to the FSM/HW style (input ports as operands).

Makes sense to me!

cgyurgyik commented 3 years ago

I haven't given your work the proper attention it deserves, but this is really cool.

For this example (straight-line sequential control), I found the resulting IR to be simpler to reason about in terms of an FSM rather than a bunch of combinational logic.

Oh yeah, totally agreed here. I don't know enough about the FSM conversion to HW/SV (e.g. are these states just a set of non-blocking assignments? ), but if this is relatively simple to drop-in and use, it would be really cool to use.

For context, the native compiler takes the following Calyx program:

import "primitives/std.lib";

component main() -> () {
  cells {
    a = std_reg(2);
    b = std_reg(2);
  }
  wires {
    group A {
      a.in = 2'd0;
      a.write_en = 1'b1;
      A[done] = a.done;
    }
    group B {
      b.in = 2'd1;
      b.write_en = 1'b1;
      B[done] = b.done;
    }
  }
  control {
    seq { A; B; }
  }
}

Compile-control pass

``` // ./target/debug/futil example.futil -p compile-control import "primitives/std.lib"; component main(@go go: 1, @clk clk: 1, @reset reset: 1) -> (@done done: 1) { cells { a = std_reg(2); b = std_reg(2); @generated fsm = std_reg(2); } wires { group A { a.in = 2'd0; a.write_en = 1'd1; A[done] = a.done; } group B { b.in = 2'd1; b.write_en = 1'd1; B[done] = b.done; } group seq { A[go] = fsm.out == 2'd0 & !A[done] ? 1'd1; fsm.in = fsm.out == 2'd0 & A[done] ? 2'd1; fsm.write_en = fsm.out == 2'd0 & A[done] ? 1'd1; B[go] = fsm.out == 2'd1 & !B[done] ? 1'd1; fsm.in = fsm.out == 2'd1 & B[done] ? 2'd2; fsm.write_en = fsm.out == 2'd1 & B[done] ? 1'd1; seq[done] = fsm.out == 2'd2 ? 1'd1; } fsm.in = fsm.out == 2'd2 ? 2'd0; fsm.write_en = fsm.out == 2'd2 ? 1'd1; } control { seq; } } ```

To Verilog

``` // fud e example.futil --to verilog // NOT INCLUDED: modules for Calyx primitives. `default_nettype wire module main ( input logic go, input logic clk, input logic reset, output logic done ); import "DPI-C" function string futil_getenv (input string env_var); string DATA; initial begin DATA = futil_getenv("DATA"); $fdisplay(2, "DATA (path to meminit files): %s", DATA); end logic [1:0] a_in; logic a_write_en; logic a_clk; logic a_reset; logic [1:0] a_out; logic a_done; logic [1:0] fsm_in; logic fsm_write_en; logic fsm_clk; logic fsm_reset; logic [1:0] fsm_out; logic fsm_done; logic [1:0] incr_left; logic [1:0] incr_right; logic [1:0] incr_out; initial begin a_in = 2'd0; a_write_en = 1'd0; a_clk = 1'd0; a_reset = 1'd0; fsm_in = 2'd0; fsm_write_en = 1'd0; fsm_clk = 1'd0; fsm_reset = 1'd0; incr_left = 2'd0; incr_right = 2'd0; end std_reg # ( .WIDTH(2) ) a ( .clk(a_clk), .done(a_done), .in(a_in), .out(a_out), .reset(a_reset), .write_en(a_write_en) ); std_reg # ( .WIDTH(2) ) fsm ( .clk(fsm_clk), .done(fsm_done), .in(fsm_in), .out(fsm_out), .reset(fsm_reset), .write_en(fsm_write_en) ); std_add # ( .WIDTH(2) ) incr ( .left(incr_left), .out(incr_out), .right(incr_right) ); assign done = fsm_out == 2'd2 ? 1'd1 : 1'd0; assign a_clk = 1'b1 ? clk : 1'd0; assign a_in = fsm_out == 2'd0 & go ? 2'd0 : fsm_out == 2'd1 & go ? 2'd1 : 2'd0; assign a_write_en = fsm_out == 2'd0 & go | fsm_out == 2'd1 & go ? 1'd1 : 1'd0; assign fsm_clk = 1'b1 ? clk : 1'd0; assign fsm_in = fsm_out == 2'd2 ? 2'd0 : fsm_out != 2'd2 & go ? incr_out : 2'd0; assign fsm_reset = 1'b1 ? reset : 1'd0; assign fsm_write_en = fsm_out != 2'd2 & go | fsm_out == 2'd2 ? 1'd1 : 1'd0; assign incr_left = go ? 2'd1 : 2'd0; assign incr_right = go ? fsm_out : 2'd0; endmodule ```

In general, the Calyx dialect still has a lot of work that needs to be completed, which includes finishing actual dialect, the lowering passes, conversion to HW/SV, and eventually optimization passes. As discussed in #1523, a good first-step may be to demonstrate Calyx optimizations through the use of the native compiler, in hopes to get a few more folks interested :-).

mikeurbach commented 3 years ago

(e.g. are these states just a set of non-blocking assignments? )

That's my understanding from the tests in the FSM PR. I'll loop back with a more specific example for comparison.

rachitnigam commented 3 years ago

This looks awesome @mikeurbach! (Came across this PR by just trying to read everything tagged with "label: Calyx"). To answer some of the point that came up in this PR:

CompileControl has indeed been replaced with TopDownCompileControl. The good thing is that top-down control corresponds closely to the natural way of implementing a FSM--by looking at the entire control program and instantiating one FSM for everything except par groups (which need to be removed separately).
Top-down compile control uses the Schedule data structure to represent the FSM in-memory and realizes it by generating a group that enables each group when needed. You can generate visual representation of the FSM using the Schedule::display in finish method of the visitor.
There is indeed no way to represent something like an fsm instance within Calyx which really only understands components and ports. Following our thread in https://github.com/llvm/circt/pull/1636, even an fsm primitive wouldn't really quite fit the way Calyx wants to represent the FSM. In an ideal world, you'd be able to instantiate the fsm within a group and treat the group normally from the scheduling language. It seems like we're converging to a place where groups should potentially allow for arbitrary dialects to be encoded within them as long as they fit the group interface and can be eventually lowered into a Calyx's assignment syntax.

mikeurbach commented 3 years ago

Thanks for taking a look @rachitnigam! I agree that whatever we converge on w.r.t. the primitive issue will influence any integration between the Calyx and FSM dialects.

mortbopet commented 2 years ago

Just wanted to add my thoughts on this; one intermediate step in this process could be to support embedding the FSM inside the control structure of the Calyx component. By doing so, we can then reference group symbols, and a state would then contain calyx.enable statements ("this group is active in this state"). As i see it, this will allow for a nice checkpoint for doing FSM optimizations - we know the output of each state (group enablements) but we don't really care about how the activation of them are explicitly materialized through the top-level I/O of the FSM. Something like:

calyx.program  {
  calyx.component @Z ...
  calyx.component @main(%go: i1, %reset: i1, %clk: i1) -> (%done: i1) {
    %z.go, %z.reset, %z.clk, %z.flag, %z.done = calyx.cell "z" @Z : i1, i1, i1, i1, i1
    calyx.wires  {
      calyx.group @A  { ... }
      calyx.group @B  { ... }
    }
    calyx.control  {
      fsm.machine @fsm() -> () {
        fsm.state "FSM_A" output  {
          calyx.enable @A
        } transitions  {
          fsm.transition @FSM_B guard  {
            %0 = calyx.group_done @A : i1 // some form of shorthand to reference the group done signal
            fsm.return %0 : i1
          }
        }
        fsm.state "FSM_B" output  {
          calyx.enable @B
        } transitions  {
          fsm.transition @FSM_done guard  {
            %0 = calyx.group_done @B : i1
            fsm.return %0 : i1
          }
        }
        fsm.state "FSM_done"
      }
    }
  }
}

It is then only later - after FSM optimizations have finished - that the I/O of the FSM materialize to generate the signals shown in the IR snippets by @mikeurbach.

rachitnigam commented 2 years ago

This makes a lot of sense! The compilation pass in the native compiler essentially generates a data structure that looks like the FSM dialect generated above. Is the idea that the FSM will eventually lowered into a pure calyx program as well?

mortbopet commented 2 years ago

Is the idea that the FSM will eventually lowered into a pure calyx program as well?

I'd only see this as relevant if we expect it to unlock some form of optimizations/transformations by lowering to (structural) calyx instead of the RTL dialects. I wouldn't expect so, since I imagine everything that optimizes structural Calyx can (should?) be performed at the RTL dialect level.

rachitnigam commented 2 years ago

I imagine everything that optimizes structural Calyx can (should?) be performed at the RTL dialect level

Possibly! I was imagining this would be useful if you wanted to go back to the native compiler after FSM lowering

mikeurbach commented 2 years ago

Nice, thanks for providing the IR snippet @mortbopet, I think going this route as part of progressive lowering, and materializing the FSM IOs late could be really nice.

Regarding the question about will the FSM lower into a pure Calyx program, I tend to agree with Morten: I think any low-level optimizations there could (and should) be done in the "core" dialects of CIRCT. This does preclude going back to the Rust compiler, but IMHO this FSM effort is more about doing things in a CIRCT-native way, using the best tools for the job that CIRCT provides. In other words, I personally see interop between the two compilers as a short term solution, and am considering this work part of the long term CIRCT-native solution.

If it is really desirable to continue interoperating at a lower level, one approach would be to have a HWToCalyx pass that converts HW/Comb/Seq back to Calyx (the inverse of CalyxToHW conversion), which could enable structural Calyx to be fed to the Rust compiler after CIRCT's low-level optimizations. Perhaps that is an interesting design point.

I also have another question or discussion point to raise about the above: is it possible to keep the FSM lowering in terms of the "core" dialects, without doing Calyx-specific lowerings? My approach(es) last summer were based on trying to keep FSM-to-HW lowering separable from Calyx-to-HW lowering, and I personally would love to see that continue. It'd be great if there was one generic FSMToHW pass. @mortbopet have you thought at all about the phase ordering of FSM lowering and Calyx lowering, and how your proposal will work during such a progressive lowering?

mortbopet commented 2 years ago

@mortbopet have you thought at all about the phase ordering of FSM lowering and Calyx lowering, and how your proposal will work during such a progressive lowering?

My thoughts on how this lowers to HW:

lower calyx control program to an FSM nested within the component (as the IR snippet)
Materialize the I/O of the FSM - this implies adding GroupGo/GroupDone I/O to the FSM
Outline the FSM from the Calyx component. and instantiate it as a cell instance. By doing so, the FSM will look like any other hardware instance from the point of view of the Calyx Component (however, it is this cell which provides all of the control signals to the groups) and the FSM no longer has any references to Calyx groups.
From here, we should be able to proceed with separate CalyxToHW and FSMToHW lowerings, and link once both are at the hw.module level.

mikeurbach commented 2 years ago

Awesome, I think that makes sense! Step 3. sounds like a bit of what I was proposing last summer, but with more progressive lowering nature.

rachitnigam commented 1 year ago

Hey folks, is there anything actionable to do for this issue? If not, we should consider closing it since it's pretty old at this point.

mortbopet commented 1 year ago

I think it can be closed - this work is mostly done, and the big remaining task is already captured in https://github.com/llvm/circt/issues/4606

llvm / circt

Experimentation with FSM dialect during Calyx lowering #1522