multiple gpu streams created when lowering gpu to gpux

See opt result of test/Integration/Dialect/XeGPU/vector_insert_1.mlir.

when main() calls test() and both functions access GPU shared memory, IMEX will geenerate code like

func test(...) -> memref<...> {
   s1 = gpuStreamCreate(...)
   ret = gpuMemAlloc(s1, ...)
   gpuLaunchKernel(ret, ...)
   gpuWait()
   gpuStreamDestroy(s1)
   return ret
}

func main() {
   s2 = gpuStreamCreate(...)
   v = test(...)
   gpuMemFree(s2, v)
   gpuStreamDestroy(s2)
}

Note that it creates 2 streams s1 and s2. Also note that, the result of test is allocated from s1, which is already destroyed at the end of test. And we are using s2 to free the memory of the result.

This happens to work in L0 and Sycl runtime, but I guess it is not actually legal in them to create memory in one stream and free it in another. This will cause an error in opencl runtime, BTW.

intel / mlir-extensions

multiple gpu streams created when lowering gpu to gpux #797