Refactor common model code

yieldthought commented 1 month ago

Goal: Build up reusable modules and code in models.common.* and refactor existing models to use them.

This should reduce existing total lines of code, make models dramatically simpler and allow us to share optimized implementations between models.
Most importantly it should make bringing up a performant new model dramatically quicker and easier

Approach: Step-by-step replacement of common modules, then supporting e.g. test code

First between Mixtral and Grok, then Mistral, then Llama, then Falcon
PR every time a module is successfully replaced in a model and tests are clean
Please inspect the PRs for your model when pinged for review with an eye for any problems I’ve overlooked in the refactoring (and also to see what the new code looks like)
Library not framework
Prefer helper functions that leave flow of execution specified by their user

Expected challenges: different models doing different things around weight loading, cache naming schemes etc.

Initial targets: In order, later ones expected to change as things are learned from the earlier ones.

[ ] RMSNorm
[ ] MLP
[ ] MoE
[ ] MQA
[ ] Decoder (maybe, just to see if replacing larger chunks makes sense or not)
- [ ] Model if so
[ ] Supporting functions for tests

yieldthought commented 1 month ago

In progress: RMSNorm refactor.

Model	Starting size	Current size	Lines of code
Mixtral	2987	2878	-109
Mistral	2289	2224	-65

davorchap commented 1 month ago

this is amazing!

tenstorrent / tt-metal