google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
Apache License 2.0
5.9k stars 499 forks source link

Use a MatMul implementation over MatVec for Prefill Computations #171

Closed austinvhuang closed 1 month ago

austinvhuang commented 4 months ago

Call for contributions for anyone interested in taking this on (@jan-wassenberg feel free to tag anyone who might be interested). The Prefill() computation is setup to allow batched computation (currently statically sized as kPrefillBatchSize).

Some pointers:

jan-wassenberg commented 4 months ago

Thanks! @pculliton @samkaufman FYI. We'll soon have a basic MatMul to test with.

jan-wassenberg commented 4 months ago

Related reading: https://siboehm.com/articles/22/Fast-MMM-on-CPU Which links to https://marek.ai/matrix-multiplication-on-cpu.html and https://github.com/flame/how-to-optimize-gemm/ (from the BLIS group).

jan-wassenberg commented 1 month ago

This is now done :D