issues
search
microsoft
/
torchscale
Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
3k
stars
201
forks
source link
fx BERT + moe
#20
Closed
buaahsh
closed
1 year ago
buaahsh
commented
1 year ago
add
--use-moe
argument
add '--pad-to-max-length` argument in pretraining task
set numpy version to 1.23.0
--use-moe
argument