Use a MatMul implementation over MatVec for Prefill Computations

austinvhuang commented 4 months ago

Call for contributions for anyone interested in taking this on (@jan-wassenberg feel free to tag anyone who might be interested). The Prefill() computation is setup to allow batched computation (currently statically sized as kPrefillBatchSize).

Some pointers:

Activations type is templated by batch size with this in mind, so to a first approx, this can be done by replacing MatVec operations with a Matmul for the Activation data that is batched for kPrefillBatchSize > 1
Prefill calls FFW() and Attention(), so the implementation changes are probably happen there. Since kBatchSize is known at comptime, this could probably even be done with if constexpr
As a first step, might start with trying just with the FFW() and assess performance differences since there's less implementation complexity to deal with.

jan-wassenberg commented 4 months ago

Thanks! @pculliton @samkaufman FYI. We'll soon have a basic MatMul to test with.

jan-wassenberg commented 4 months ago

jan-wassenberg commented 1 month ago

This is now done :D

google / gemma.cpp

Use a MatMul implementation over MatVec for Prefill Computations #171