google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web
Other
1.89k stars 376 forks source link

Adding Tiled Scalar Packing Kernels for QB4W #7521

Open mcr229 opened 2 days ago

mcr229 commented 2 days ago

We see ~2.2x speed up on packing

mcr229 commented 2 days ago

@fbarchard for scalar packing kernels review