Closed ikawrakow closed 1 month ago
This is useful for parallel FFNs. unary can be silu, gelu or relu.
unary
silu, gelu
relu
Implemented for CPU, CUDA and Metal.
Speedup is disappointingly small (1-3% for PP, depending on platform and model).
Let me think some more if I want to merge it.
This is useful for parallel FFNs.
unary
can besilu, gelu
orrelu
.Implemented for CPU, CUDA and Metal.
Speedup is disappointingly small (1-3% for PP, depending on platform and model).
Let me think some more if I want to merge it.