Goal: Build up reusable modules and code in models.common.* and refactor existing models to use them.
This should reduce existing total lines of code, make models dramatically simpler and allow us to share optimized implementations between models.
Most importantly it should make bringing up a performant new model dramatically quicker and easier
Approach: Step-by-step replacement of common modules, then supporting e.g. test code
First between Mixtral and Grok, then Mistral, then Llama, then Falcon
PR every time a module is successfully replaced in a model and tests are clean
Please inspect the PRs for your model when pinged for review with an eye for any problems I’ve overlooked in the refactoring (and also to see what the new code looks like)
Library not framework
Prefer helper functions that leave flow of execution specified by their user
Expected challenges: different models doing different things around weight loading, cache naming schemes etc.
Initial targets:
In order, later ones expected to change as things are learned from the earlier ones.
[ ] RMSNorm
[ ] MLP
[ ] MoE
[ ] MQA
[ ] Decoder (maybe, just to see if replacing larger chunks makes sense or not)
Goal: Build up reusable modules and code in models.common.* and refactor existing models to use them.
Approach: Step-by-step replacement of common modules, then supporting e.g. test code
Expected challenges: different models doing different things around weight loading, cache naming schemes etc.
Initial targets: In order, later ones expected to change as things are learned from the earlier ones.