Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.35k
stars
484
forks
source link
Mixtral MoE improvements: transposed w2 to have reduction dim be innermost dim #128
Closed
yanboliang closed 3 months ago