google / xls

XLS: Accelerated HW Synthesis
http://google.github.io/xls/
Apache License 2.0
1.21k stars 178 forks source link

Channels should be encapsulated by procs in the IR #869

Open grebe opened 1 year ago

grebe commented 1 year ago

Right now, channels are global and procs can arbitrarily reference them. This freedom doesn't necessarily buy that much as codegen needs to plumb them up and down the hierarchy eventually anyways. It seems easier to understand what channels do if they were encapsulated by the proc that instantiates them (and would also allow procs to be invoked, and perhaps aid linking, etc.).

Also, it might make sense to make channel declarations nodes- right now channels are kind of their own weird thing in the IR and it's occasionally annoying that they aren't nodes.

meheff commented 1 year ago

I like the idea. About channel declarations being nodes, I don't think it makes sense for them to be actual xls::Nodes, rather they could be a proc-scoped construct similar to registers in blocks. Making them xls::Nodes implies a level of dynamism we don't support. For example, can you pass them around? put them in arrays? select among them?

Proc-scoped channels would also help reduce name collisions. Right now channels are global object so no two channels can be named the same. This is annoying and this constraint does not reflect any underlying constraint in the generated code. Theese channels often map to verilog module ports and there it's fine if two different modules have ports with the same name.

meheff commented 1 year ago

Spit balling a possible syntax:

proc foo<ch: chan(bits[8])>(tok: token, state: State) {
  ...
}

proc bar<ch0: chan(bits[32]), ch1: chan(bits[16])>(tok: token, state: State) {
  // Declare proc-scoped channels and spawning of other procs.  
  chan ch2(bits[8], id=0, kind=streaming, ops=send_receive, flow_control=none);
  spawn foo<ch3>;

  // Normal logic of the proc. ch0, ch1, and ch2 can all be used in send/receives.
  ...
}
meheff commented 1 year ago

For the purpose of referring to things, it might be good to give spawn statements names:

  baz: spawn foo<ch3>;

This gives you a string path for referring to a specific instantation (spawning) of a proc.

proppy commented 1 year ago

@tmichalak https://github.com/google/xls/commit/bf28838aa0fc0e00ac6165511ffbc5e5f4430195 fyi (for some reason it seems that I can't mention @antmicro/xls)

mczyz-antmicro commented 10 months ago

Hey, I have a few points to contribute to this conversation from a designer's point of view. I am currently engaged in an effort to design new modules in DSLX and have the following experiences:

  1. As previously reported in #959, the IR fails to discover all channel connection in top levels with empty next() function. This causes the Verilog generation to produce incomplete implementations and effectively limits designs that can be tested.
  2. It was at some point mentioned that if you select the inner proc as the entrypoint, the proc discovery in IR may perform correctly (workaround). I observed a case, where this does not help and I provide an example on branch, commit. File xls/examples/passthrough.x defines 2 processes: Passthrough and Wrapper. Information about spawned Passthrough is lost at IR level (presumably due to empty next).
  3. A workaround for this issue can be used at the cost of adding dummy channels on top-level. On the same branch, I also prepared a PassthroughDummy process, which adds dummy channels thorough the hierarchy and performs redundant empty send/recvs. This forces the IR tool to recognize the hierarchy and produce valid verilog.
  4. I attach files from IR/Verilog generation to this message.

After analysis of the proc scoped channels proposal, I believe that this approach would solve my problems and help a lot. I wonder, however, if this is the best solution. I can imagine that during analysis of the DSLX code, an algorithm could traverse through the hierarchy of processes (analyze proc,spawn relationships) and build a hierarchy graph, which could be inserted as metadata in the IR. It seems to me that it would be a cheaper solution than "proc scoped channels", unless there are some limitations with IR manipulation that I am not aware of.

I encountered also another problem mentioned in the "proc scoped channels" docs, which is inability to create unconnected ports (equivalent to .clk(clk), .unused_output()). The workaround I am currently using is to create a wrapper process and perform "empty" sends and receives so that tools can detect that each channel has an associated send/recv operation. Even though this approach works, it is inconvenient at scale.

What do you think?

gh_869_example_logs.tar.gz

meheff commented 10 months ago

@mczyz-antmicro You're correct that proc-scoped channels will solve the issues you describe. Regarding implementing this as metadata, that could be done but I don't think it is the most robust solution. Keeping metadata on the side which must be kept in sync with the IR is a potential source of complexity and bugs (where the metadata does not match the IR) Sometimes it may be necessary, but ideally you avoid it and only have to keep one representation at a time. You're right in that changing channels to being proc-scoped is a significant amount of work and the metadata approach would likely be cheaper, but I think it is worth the investment now to get the right representation. The work is already underway, you can now see that there is a bit on procs ("new_style") which indicates whether it is the new kind of proc which can have channels declared in it. I'll keep adding more and more functionaliity to these "new style" procs until they can do everything the existing procs can do, and then switch existing frontends and designs over and remove the old proc representation.

Regarding the use case of an unused port, I think the proc-scoped channel approach makes this much easier to support. With proc-scoped channels, there is an explicit declaration of the interface of the proc, for example:

proc foo<x: bits[32] in, y: bits[32] out>(t: token, st: ...) {
   ...
}

Here x and y are the interface of the proc. To create an unused port you would just not connect, say, x to any send/receive node. You'd probably also want to annotate x in some way to indicate that it doesn't need flow control (ready/valid) and maybe some other things. Without proc-scoped channels, the interface is implicitly defined on the proc via global channel declaration and a use of that channel in the proc. However, in your example you don't want a use of the channel in the proc so how do you represent it? With proc-scoped channels the proc interfaces, channel declarations, and proc instantiations explicitly define a statically-known (at compile time) network of connectivity throughout the design. This approach makes it easier to specify connectivity between different levels in the hierarch. For example, you may have top-level input port which gives some configuation value that you want to propagate to multiple leaf blocks in the design. Proc-scoped channels makes that easy. Doing the equivalent with global channels is trickier. The compiler has to reconstruct the desired connectivity. With proc-scoped channels the desired connectivity is clearly stated.

Channels are proc-scoped in the frontend (DSLX) and also in the verilog representation where you have a hierarchy of instantations and all connectivity (ie, channels) are encapsulated as reg/wire/logic declarations within a module, so I think it makes sense for the IR to have a similar representation.

mczyz-antmicro commented 10 months ago

Thanks for a great explanation @meheff , I am now convinced that it is worth to wait for the new implementation. Happy to hear that work on new style has already started!

I have one more idea I would like to share: correct me if I'm wrong, but I believe that currently to combine design with an SRAM macro, one needs to expose the RAM channels all the way to the top-level, right? Have you given it thought if proc based channels could improve this situation? It would be neat if we could have something like a proc decorator so that the tool is capable of generating verilog with an exposed interface,e.g:

#[sram_proc]
proc RAM_2048_32<>(){}
mczyz-antmicro commented 10 months ago

Unfortunately, I've found that the workaround I had proposed does not always work:

3. A workaround for this issue can be used at the cost of adding dummy channels on top-level. On the same branch, I also prepared a `PassthroughDummy` process, which adds dummy channels thorough the hierarchy and performs redundant empty send/recvs. This forces the IR tool to recognize the hierarchy and produce valid verilog.

I modified the proc_iota example to use the dummy channels https://github.com/antmicro/xls/commit/7397835f26c6dec0753ac31d37c550412f3a3282 and the IR optimization is still "over eager".

Attaching files from the verilog generation flow proc_iota_v.tar.gz

lpawelcz commented 10 months ago

Hi, I recently encountered the same problems as @mczyz-antmicro. My attempts to generate verilog for RleBlockDecoder (part of https://github.com/google/xls/pull/1213) as well as BlockDecoder (part of https://github.com/google/xls/pull/1214) were both unsuccessful. IRs generated from linked DSLX files contain top procs which look something like this:

top proc __rle_block_dec__RleBlockDecoder_0_next(__token: token, __state: (), init={()}) {                                                    
  literal.17: bits[1] = literal(value=1, id=17)                                                                                               
  tuple.18: () = tuple(id=18, pos=[(0,349,31)])                                                                                               
  after_all.19: token = after_all(__token, id=19)                                                                                             
  next (after_all.19, tuple.18)                                                                                                               
}   

Performing IR optimization and codegen with this default top proc results in empty verilog module, like so:

module RleBlockDecoder(
  input wire clk,
  input wire rst
);

endmodule
  1. It was at some point mentioned that if you select the inner proc as the entrypoint, the proc discovery in IR may perform correctly (workaround).

I decided to follow this approach and I've manually set top level proc for IR opt as one of the internal procs (for RleBlockDecoder example it was BatchPacker proc to be specific) This caused codegen to generate verilog logic only for the BatchPacker proc.

  1. A workaround for this issue can be used at the cost of adding dummy channels on top-level. On the same branch, I also prepared a PassthroughDummy process, which adds dummy channels thorough the hierarchy and performs redundant empty send/recvs.

I've also tested this workaround and it didn't work for me. It resulted in generation of verilog logic that only receives and sends data through those dummy channels.

I've come up with an idea for a possible workaround which involved:

This can be visualized by the following diagram: Workaround proc hierarchy drawio

I tested the idea on https://github.com/antmicro/xls/commit/5a72b99f8fc45771738ba9a30e083dd185e9054d and this caused codegen to generate verilog code for RleBlockDecoder which only receives data from input ports and sends it out on other ports which are the inputs to internal proc which is outside of the generated module. The result for this solution is very similar to the one we get from using dummy channels.

I've also looked at the existing examples in XLS and it seems like in the whole repository there is only one example which is a proc that spawns other procs and has rules for verilog generation. This is Delay32x2048_init3 which is a specification of parameterized Delay. Here is verilog generation rule: LINK. For IR optimization it uses first internal proc as top proc. That results in generating verilog only for that specific internal proc which happens to be DelayInternal which is not correct.

lpawelcz commented 9 months ago

Just for the record. As stated in https://github.com/google/xls/issues/1253#issuecomment-1880958308 there is in fact a workaround which allows to generate valid verilog. For complex procs which spawn multiple internal procs it is important to inline internal procs by passing additional argument to bazel rule for optimizing the IR:

"inline_procs": "true"

For procs which have empty next() function it is also required to explicitly set top argument to the last internal proc in the data flow, e.g.:

"top": "__xls_modules_zstd_dec_mux__BlockDecoder__DecoderMux_0_next"

Additionally, all internal procs must have fifo_depth explicitly set to 1, e.g.:

proc BlockDecoder {
...
    config (...) {
    ...
        let (demux_raw_s, demux_raw_r) = chan<BlockDataPacket, u32:1>;
    ...
    }
...
}