google / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.39k stars 247 forks source link

Clean up MoE brute force implementation #741

Closed RissyRan closed 3 weeks ago

RissyRan commented 3 weeks ago

Description:

Clean up MoE brute force implementation, so that we have 2 strategies: 1) MoE dropless 2) MoE dropping (with expert parallelism in the future)

Test

Test script 1: link Test script 2: link