Closed makslevental closed 1 month ago
Nice! I am wondering though, one of the issues with XRT HAL was the heavy dependence on C++ which was leaking into the HAL C structs that cant have C++ objects so we had pointers to them and ran into ownership issues of those pointers.
Interesting - do you remember which objects specifically? I have to admit that one thing that isn't clear to me (in HAL land) is what is allowed-to-be/should-be an "inlined struct" and what needs to be a pointer. Currently I believe I'm managing lifetimes correctly but I haven't actually rigorously checked (I keep failing to successfully get an ASAN build working for us).
Nice! I am wondering though, one of the issues with XRT HAL was the heavy dependence on C++ which was leaking into the HAL C structs that cant have C++ objects so we had pointers to them and ran into ownership issues of those pointers.
Interesting - do you remember which objects specifically? I have to admit that one thing that isn't clear to me (in HAL land) is what is allowed-to-be/should-be an "inlined struct" and what needs to be a pointer. Currently I believe I'm managing lifetimes correctly but I haven't actually rigorously checked (I keep failing to successfully get an ASAN build working for us).
I think all the smart pointer patterns you see in the XRT HAL are a symptom of that issue. We essentially used the .release() on smart pointers to take ownership of the objects as you couldnt just new/delete them in the XRT API, and an example of the ownership issues I am referring to are shown by the need of PRs like this https://github.com/nod-ai/iree-amd-aie/pull/641 although looks like that was later fixed (as I don't see the global device in the xrt_device.h in the code anymore) We also had trouble with the compute kernel being deleted because we were deleting the xclbin, but we technically had ownership of the kernel.
Let's try this again: this is a HAL that skips/avoids/elides XRT completely. It does so by (essentially) going straight to the XDNA driver's (kmq) shim which talks directly to the kernel module. Note that XDNA's shim actually couples tightly to XRT itself so the bulk of the actual "work" here is/was decoupling. Thus, we vendor the decoupled shim. Just to be explicit/clear: after this PR, on Linux, the only thing needed in the environment is the kernel module itself - not XRT, nor XDNA's shim.
Call outs:
.xclbin
as the artifact and now just load the.pdi
directly;kmq
) path; user-managed queues (umq
) aren't supported until aie4;TODOs before landing:
null
HAL;movehwq
intokernel
shout-out to @benvanik for putting together the
null
HAL lickety-split for me to study while working on this PR.