issues
search
google
/
maxtext
A simple, performant and scalable Jax LLM!
Apache License 2.0
1.39k
stars
247
forks
source link
Enable quantization for MoE Gating
#757
Closed
RissyRan
closed
1 week ago
RissyRan
commented
2 weeks ago
Description
Enable quantization for MoE Gating (will work on kernel quantization as next step)
Observed perf regression when quantizing einsum(), so not include in this PR. Need to dive deep for matmul implementation later.
Test
Adding
quantization=int8
flag:
Megablox without kernel quantization (
test
, mixed precision in
xprof
in gating)
Description
Test
Adding
quantization=int8
flag: