Closed dasistwo closed 11 months ago
Jaxite is not yet performant, and most of our efforts working on it have been to make it performant for TPU architectures. Even then, it does not have performance parity with CPU-parallel tfhe-rs
, though we're working on improving it. As a result, you'll see very poor performance. And while improvements to the TPU side should come with improvements to GPU, we're not particularly focused on GPU optimizations at the moment.
Hi all! I was trying to compare the performance between the tfhe-rs and the Jaxite, expecting that the Jaxite would be way faster than the tfhe-rs as it exploits the GPU, but I found that the Jaxite was too slow than the tfhe-rs. I want to know if my configuration is wrong, or the Jaxite is not fully developed yet.
I've tested with the transpiler of the Jaxite and the tfhe-rs, and used the example of hello_world. I do not use the
bazel run
when I tested with Jaxite, asbazel run
cannot initiate the CUDA. (Seems that the GPU / TPU test was not publicly opened in the bazel as far as I checked in here.) Rather, I just ran directly with the python.The Jaxite spends about 10000 seconds per evaluation, which was not successful after the first iteration, while the tfhe-rs spends about 30 seconds.
Both codes were based on the same netlist file, which means that they went through the same step but at the very end with different transpilers.
This is the code that I've used as a testbench for the Jaxite.
This is the modified BUILD file to create the py_library
I'm using Python 3.10.13, Nvidia V100 as GPU, and CUDA 11.8. Tell me if my testbench or configuration is wrong.