Open prathams417 opened 6 months ago
I have been able to generate IR for OpGroupAsyncCopy in the match and rewrite for insert_slice_async. Issues in the spir-v translator are blocking the progression of this task and a ticket has been created with the SPIR-V team
Branch with latest changes: prathams/spriv-codegen-async-insertslice
Related Jira ticket has been added to internal wiki under Internal Issues Tracking -> Compiler
Currently the insert_slice_async op is not actually done asynchronously, rather it is decomposed into: // insert_slice_async %src, %dst, %idx, %mask, %other // => // %tmp = load %src, %mask, %other // %res = insert_slice %tmp into %dst[%idx]
Nvidia has already replaced this with their PTX assembly instruction cp.async.cg.shared.global.
We should remove the decomposition in favour of similar operations existing in spir-v: OpGroupAsyncCopy and OpGroupWaitEvents