nod-ai / iree-amd-aie

IREE plugin repository for the AMD AIE accelerator
Apache License 2.0
69 stars 30 forks source link

[AMDAIEAccessToAcquireRelease] Insert acquire/release pairs in same scope #904

Closed newling closed 3 days ago

newling commented 1 week ago

Before this PR, the logic for read access was invalid. Code like

for i in range(10):
    read_access
read_access

got transformed to

for i in range(10):
    acquire
    read_access
release
acquire
read_access
release

As you can see above, the code ignored the scoping of the accesses. This PR ensures that acquires and releases are inserted in the same scope.

jtuyls commented 1 week ago

TBH I still don't understand what these ops actually mean. I would like to update the write access case too to be scope-safe, but am hesitant to do so before I understand the underlying semantics. I have added a lit test illustrating how our vanilla matmul's writes get transformed, it is not obvious to me why this is the desired place to put acquires and accesses. @jtuyls I would really appreciate if you can write some rough documentation of what these ops (acquire-access-release) actually signify, and why this pass does what it does.

These ops are documented here: https://github.com/nod-ai/iree-amd-aie/blob/076ea13e599d5c4283b989d0e6587b698220ef53/compiler/plugins/target/AMD-AIE/iree-amd-aie/IR/AMDAIEOps.td#L1102. They relate to the AIE semaphore locks, which you can find a lot of information about. Could you ask specific questions or tell me what's not clear? I am not going to speculate on what you're looking for and write further documentation at this point.

newling commented 1 week ago

Could you ask specific questions or tell me what's not clear?

Why does this pass insert acquire/release ops where it does?

By the definition in https://github.com/nod-ai/iree-amd-aie/blob/076ea13e599d5c4283b989d0e6587b698220ef53/compiler/plugins/target/AMD-AIE/iree-amd-aie/IR/AMDAIEOps.td#L1102 it would seem fine to make the wrapping tight, i.e. insert the acquire just before the access, and the release just after. Or why not just specify that the access is atomic? But that's not what the pass seems to doing, and that also seem to work (numerical errors).

What is happening with None accesses, specifically what is the reason for the transformation that takes place in epilogue_write_with_preceding_none_accesses (new lit test)? Why is the logic different for Write and Read, but the definition in https://github.com/nod-ai/iree-amd-aie/blob/076ea13e599d5c4283b989d0e6587b698220ef53/compiler/plugins/target/AMD-AIE/iree-amd-aie/IR/AMDAIEOps.td#L1102 there is no difference.

Also, please can you review the change to the read case and let me know if the changes are valid.

newling commented 3 days ago

it would seem fine to make the wrapping tight, i.e. insert the acquire just before the access, and the release just after. Or why not just specify that the access is atomic? But that's not what the pass seems to doing, and that also seem to work (numerical errors).

These are sempahore acquire/release operations, which are atomic operations, not sure though what you mean with 'the access is atomic'? You have to acquire before the first use and release after the last use. Because once you release a buffer, DMAs can again acquire it and read/write access it.

Makes sense now.

What is happening with None accesses, specifically what is the reason for the transformation that takes place in epilogue_write_with_preceding_none_accesses (new lit test)?

This is the result of different core 'programs' being stitched together into a single core program (here it's the prolog, main and epilog, but it doesn't have to). In the prolog and main there is an access to an objFifo, which is can't be designated as read nor as write at that point. However, in the epilog it can, but the none accesses still need to be taken into account to insert the semaphore ops at the correct places.

Why is the logic different for Write and Read

Maybe this can be consolidated, I would need to have another in-depth look.

There is a problem with using the same algorithm for Write (it results in numerical errors, so definitely some problem). I want to get back to investigating this at a later point, and have left a TODO(newling) in the code for this. As such, I would like to leave the Write case untouched in this PR.

jtuyls commented 3 days ago

There is a problem with using the same algorithm for Write (it results in numerical errors, so definitely some problem). I want to get back to investigating this at a later point, and have left a TODO(newling) in the code for this. As such, I would like to leave the Write case untouched in this PR.

Yeah, this is what I remember as well, but I don't know exactly why that was the case. Maybe it's the 'none' handling.