fx BERT + moe - Githubissues

microsoft / torchscale

Foundation Architecture for (M)LLMs

https://aka.ms/GeneralAI

MIT License

3k stars 201 forks source link

fx BERT + moe #20

Closed buaahsh closed 1 year ago

buaahsh commented 1 year ago

add --use-moe argument
add '--pad-to-max-length` argument in pretraining task
set numpy version to 1.23.0