iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.47k stars 551 forks source link

Add `stream.async.move` as an equivalent to `stream.async.transfer` but with move semantics. #17693

Open benvanik opened 1 week ago

benvanik commented 1 week ago

stream.async.transfer is a copy operation and behaves as a spicy stream.async.clone (can move across affinities, can change memory type, etc). To allow runtime-determined copy elision we need an op that also acts spicy but is allowed to be a no-op at runtime if the copy is not needed. ElideAsyncCopiesPass can be extended to turn transfers into moves where last-use analysis determines it's safe to do so as is likely to be the case in many programs. Moves can lower into an stream.cmd.move op with timepoints that then ends up as a hal.device.queue.move (or transfer? meh). At runtime the queue move can reflect to determine if a pointer cast is allowed or whether a queue.alloca(dst) + queue.copy(src, dst) + queue.dealloca(src) is required.

Once plumbed through it'll be a runtime issue to allow iree_hal_device_t/iree_hal_allocator_t to indicate compatibility and whether to elide the clone. That may need some additional interfaces that don't currently exist.