google / heir

A compiler for homomorphic encryption
https://heir.dev/
Apache License 2.0
323 stars 47 forks source link

Not possible to execute the XRT code in the bazel test environment #405

Open WoutLegiest opened 9 months ago

WoutLegiest commented 9 months ago

In a new end to end example, we try to evaluate tfhe-rs-bool code on an FPGA. To communicate with an Xilinx Alveo FPGA we use the XRT library. There is a problem during the initialisation of the FPGA device in the Rust code.

let dev = xrtDeviceOpen(0);
println!("xrt Dev open");

let file_path = CString::new(
    "/home/wout/heir/tests/tfhe_rs_bool/end_to_end_fpga/accel.xclbin"
).unwrap();

let xclbin = xrtXclbinAllocFilename(file_path.as_ptr());
xrtDeviceLoadXclbinHandle(dev, xclbin);

Note: All the functions that start with xrt are calls to the C-API of the XRT library.

Once this test is run in with bazel test ... , we got the following output:

# | XRT build version: 2.13.466
# | Build hash: f5505e402c2ca1ffe45eb6d3a9399b23a0dc8776
# | Build date: 2022-04-14 17:43:11
# | Git branch: 2022.1
# | PID: 47
# | UID: 1000
# | [Tue Jan 30 16:22:00 2024 GMT]
# | HOST: fatal runtime error: Rust cannot catch foreign exceptions

The problem already starts with the xrtDeviceOpen function, which won't execute. The XRT library is clearly started from the Rust code, but from then it is unclear what happens. The error message are generated by the XRT lib, while bazel and/or cargo cannot process them.

j2kun commented 9 months ago

@WoutLegiest can you post the bazel rule as you've currently set it up? Or perhaps push a draft PR with the changes so we can take a closer look?

I was reading some docs at https://xilinx.github.io/XRT/master/html/xrt_native_apis.html and see

XCL_DRIVER_DLLESPEC xrtDeviceHandle xrtDeviceOpenByBDF (const char * bdf) PCIe BDF identifying the device to open

One thing we can try is passing in the PCIE BDF (e.g., "0000:03:00.1" from the docs)

A bit more digging, and I expect there is a way to communicate with a PCI-connected device via standard unix filesystem mounts.

Reading the Linux kernel docs, https://docs.kernel.org/PCI/sysfs-pci.html, I suspect the device might be accessible at one of

/sys/devices/xxxx
/sys/bus/pci/devices/xxxx

If this is mounted as a traditional UNIX file, you might be able to make it work by adding these /sys/ paths to bazel's sandbox_writeable_path flag (maybe even adding --sandbox_writeable_path=/sys, but first I'd check to see if you can locate the exact device's location on the file system)

j2kun commented 9 months ago

You might also try running findmnt to get the list of all mounted filesystems, and see if the FPGA is in there.

j2kun commented 9 months ago

If that doesn't work, I wonder if we could find a minimal working example that I could set up with my machine, by plugging some other (cheap) FPGA into my desktop and tinkering with it. Would the XRT API work with every FPGA?

WoutLegiest commented 8 months ago

I already tried to run the xrtDeviceOpenByBDF function, same result. Also found the device location in the /sys/ folder, added it to sandbox path without any outcome.

I found it was possible to run xbutil program from an MLIR file, so talking to the FPGA from the bazel sandbox is possible and works correctly. Possibly the calling of Rust -> C -> FPGA will introduce the problem.

WoutLegiest commented 8 months ago

If that doesn't work, I wonder if we could find a minimal working example that I could set up with my machine, by plugging some other (cheap) FPGA into my desktop and tinkering with it. Would the XRT API work with every FPGA?

The XRT library is designed for the AMD Alveo cards, which are all PCIe cards that can be plugged into any pc. More specifically, the u55c, u250, u280 are cards with large FPGA on them, sadly none of them are cheap.