ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance
MIT License
89 stars 6 forks source link

Fused unary(x)*y #70

Closed ikawrakow closed 1 month ago

ikawrakow commented 1 month ago

This is useful for parallel FFNs. unary can be silu, gelu or relu.

Implemented for CPU, CUDA and Metal.

Speedup is disappointingly small (1-3% for PP, depending on platform and model).

Let me think some more if I want to merge it.