inference speed evaluation

Thank you for your attention.

In this paper, we have only presented the theoretical reduction in computation and memory, as there is currently no efficient open implementation available for BNNs inference engine. However, we have recognized the demand for such an engine and are currently developing a new engine tailored specifically for BNNs on both GPUs and CPUs. We will keep you updated on our progress.

Regarding your second question, it is worth noting that BNNs can significantly improve computation speed on the CPU, due to the limited resources available. On the other hand, optimizing computation speed on the GPU can be more challenging, as the current community has already made significant optimizations to higher bit computation on the hardware side. However, based on my experience, we can still achieve up to 10-17x faster computation acceleration than Pytorch, without any special optimization techniques, especially when dealing with larger matrices (e.g., 10,000x10,000).

Basicly, the inference acceleration potential of BNNs is largely depend on the resources available.

hpi-xnor / BNext

inference speed evaluation #7