stream.async.transfer is a copy operation and behaves as a spicy stream.async.clone (can move across affinities, can change memory type, etc). To allow runtime-determined copy elision we need an op that also acts spicy but is allowed to be a no-op at runtime if the copy is not needed. ElideAsyncCopiesPass can be extended to turn transfers into moves where last-use analysis determines it's safe to do so as is likely to be the case in many programs. Moves can lower into an stream.cmd.move op with timepoints that then ends up as a hal.device.queue.move (or transfer? meh). At runtime the queue move can reflect to determine if a pointer cast is allowed or whether a queue.alloca(dst) + queue.copy(src, dst) + queue.dealloca(src) is required.
[ ] add stream.async.move
[ ] make ElideAsyncCopiesPass turn transfers into moves
[ ] add stream.cmd.move and lowering
[ ] add hal.device.queue.move and matching iree_hal_device_queue_move
[ ] implement all as full expansion of alloca+copy+dealloca to start
Once plumbed through it'll be a runtime issue to allow iree_hal_device_t/iree_hal_allocator_t to indicate compatibility and whether to elide the clone. That may need some additional interfaces that don't currently exist.
stream.async.transfer
is a copy operation and behaves as a spicystream.async.clone
(can move across affinities, can change memory type, etc). To allow runtime-determined copy elision we need an op that also acts spicy but is allowed to be a no-op at runtime if the copy is not needed.ElideAsyncCopiesPass
can be extended to turn transfers into moves where last-use analysis determines it's safe to do so as is likely to be the case in many programs. Moves can lower into anstream.cmd.move
op with timepoints that then ends up as ahal.device.queue.move
(or transfer? meh). At runtime the queue move can reflect to determine if a pointer cast is allowed or whether a queue.alloca(dst) + queue.copy(src, dst) + queue.dealloca(src) is required.stream.async.move
ElideAsyncCopiesPass
turn transfers into movesstream.cmd.move
and loweringhal.device.queue.move
and matchingiree_hal_device_queue_move
Once plumbed through it'll be a runtime issue to allow
iree_hal_device_t
/iree_hal_allocator_t
to indicate compatibility and whether to elide the clone. That may need some additional interfaces that don't currently exist.