Closed coreylowman closed 1 year ago
f16 matrix multiplication is now implemented in gemm
v0.15, though integration in faer is unlikely. i would ideally like to wait until f16 gets proper simd support so i can figure out how to best integrate it into the library
Is your feature request related to a problem? Please describe. Currently none of the matrix multiplication crates support the half (fp16) datatype. There is the half crate (https://crates.io/crates/half) that includes a rust type for this.
This means for any crate that needs to support matrix multiplication on CPU with f16 datatype, they need to manually implement a matrix multiplication algorithm, which can be very slow.
Describe the solution you'd like It'd be great if faer included configurable support for these data types.
Describe alternatives you've considered I'm currently using this super naive matmul implementation:
Which is extremely slow. I'm not sure what the other alternatives are.
Additional context
I'm the author of dfdx. dfdx has both cuda/CPU support, and CUDA does have hardware acceleration for f16. It'd just be nice if f16 matmul on CPU was a bit faster!