[StandardToHandshake] Passing `memref` to a function will still use `memref.alloc`

Dinistro commented 2 years ago

Passing the following code to --lower-std-to-handshake produces a weird output.

func.func @external_mem(%mem : memref<4xi32>) {
  return
}

func.func @normal_mem() {
  %mem = memref.alloc() : memref<4xi32>
  func.call @external_mem(%mem) : (memref<4xi32>) -> ()
  return
}

Output:

handshake.func @external_mem(%arg0: memref<4xi32>, %arg1: none, ...) -> none attributes {argNames = ["in0", "inCtrl"], resNames = ["outCtrl"]} {
  extmemory[ld = 0, st = 0] (%arg0 : memref<4xi32>) () {id = 0 : i32} : () -> ()
  return %arg1 : none
 }
handshake.func @normal_mem(%arg0: none, ...) -> none attributes {argNames = ["inCtrl"], resNames = ["outCtrl"]} {
  %0:2 = fork [2] %arg0 : none
  %1 = memref.alloc() : memref<4xi32>
  %2 = instance @external_mem(%1, %0#0) : (memref<4xi32>, none) -> none
  sink %2 : none
  return %0#1 : none
}

As you can see, a memref.alloc() remains, which is not expected and thus breaks further transformations.

I would expect that a correct lowering would construct a MemoryOp in the @normal_mem function. How this exactly interacts with the external memory op is not clear to me, as this will cause cross function dependencies depending on the amount of memory operations in @external_mem.

mortbopet commented 2 years ago

Thank you for filing the issue - this is one point which has been completely untouched so far, and so any research into how to best do this would be interesting. Both this, as well as passing a memref block argument to other function calls.

As you mention, a correct lowering here would indeed be that a handshake.memory op is constructed in the @normal_mem function. But then, how does that work with our current schema where we allow passing memrefs to arguments of handshake.funcs? Becuase there's obviously a mismatch there.

I've never been a fan of handshake.memory and handshake.extmemory having essentially the same interface and almost doing the same - handshake.memory both allocates and creates an interface to a handshaked memory. handshake.extmemory only creates an interface - so we have some redundancy in the IR which creates this funky situation.

Thinking about this, i'm wondering whether memories in the handshake dialect should be modifed such that:

handshake.memory is removed, and we allow memref.alloc operations in handshake IR
handshake.extmemory is modified to a handshake.meminterface, but is kept identical to its current implementation.

By doing this we no longer have this mix of handshake memory ports only sometimes being materialized.

mortbopet commented 2 years ago

I'll try taking a look at the memref.alloc/handshake.meminterface to see if it makes sense.

Dinistro commented 2 years ago

Did you gain any insights so far?

One problem I'm seeing with the approach you are mentioning is when lowering these operations. While allowing memref types on the handshake level resolves certain things, a meminterface operation will influence the interface of the circuit. The circuit interface will depend on the number of loads and stores interacting with the memory. This can be dealt with when the memref accepting function and its caller(s) are in the same compilation unit. In the case of different compilation units, with external functions that accept memory, this will break down as one doesn't know the exact interface from the external handshake::FuncOp definition.

mortbopet commented 2 years ago

Correct, the approach requires that you have full awareness of the entire circuit and can analyze your way to statically determine the # of unique load/store accesses to any given memory. I started on this analysis approach, but work is currently taking priority.

This is not an easy problem to solve. I sense that the core of the problem lies in the fact that memory complexity is essentially fully outsourced to callers of the circuit. This goes not just for generating a unique load/store interface for every access in the program, but also the ability to handle runtime memory conflicts. This requires that a load-store queue is attached to the interface of the lowered handshake function. Something which i am 99% confident no one has yet to do nor make any progress on.

Ideas are welcome!

Dinistro commented 2 years ago

I started on this analysis approach, but work is currently taking priority.

No worries. It seems that I'm to only "active" user of these parts of the handshake dialect, thus I should be the one investing time into that. I'll therefore invest some time into this issue, but it might be that we declare this part as out-of-scope for my thesis. Most of the use-cases that would require memory would either way require a load-store queue with conflict resolution to be even remotely efficient.

I plan to schedule a meeting with Lana Josipović, and this might be something I would like to discuss with her.

mortbopet commented 2 years ago

load-store queue with conflict resolution to be even remotely efficient.

efficiency, at the cost of possibly huge resource usage - LSQs are not cheap...!

mikeurbach commented 2 years ago

Are LSQs required for correctness, or is it a tradeoff for throughput? I was under the impression the way we thread control signals through should preserve the original program ordering (at the cost of throughput). But I've only tested pretty simple circuits, so maybe one could think up a counter example.

mortbopet commented 2 years ago

I was under the impression the way we thread control signals through should preserve the original program ordering (at the cost of throughput).

the latter - and you are correct, we do sequentially thread control signals through a join to ensure correctness: https://github.com/llvm/circt/blob/main/lib/Conversion/StandardToHandshake/StandardToHandshake.cpp#L1553-L1562

mikeurbach commented 2 years ago

Thanks for the clarification.

This is not an easy problem to solve. I sense that the core of the problem lies in the fact that memory complexity is essentially fully outsourced to callers of the circuit.

Fully agree with this part. @Dinistro thanks for offering to push on this, but if you end up declaring this out of scope, that sounds quite reasonable. We've elsewhere talked about other mechanisms for interfacing to Handshake circuits, and have been having some offline conversations about ESI services as a potential way to interface to "things", and I personally hope Handshake can be one of those things.

Dinistro commented 2 years ago

Thanks for pointing this out to me @mikeurbach. I already saw some parts of this, but didn't go into too much detail. The system that we are currently targeting has an XDMA interface exposed with AXI4 streams, using ESI services to interface with that might be a reasonable approach. I'll have to talk to some people that are more familiar with the system and the interfaces, though.

llvm / circt

[StandardToHandshake] Passing `memref` to a function will still use `memref.alloc` #3268