Open jasperzhong opened 4 years ago
暂时没paper release,等后续
不过blog已经给了不少细节了 https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
reddit: https://www.reddit.com/r/MachineLearning/comments/f1tuv0/r_turingnlg_a_17billionparameter_language_model/
用了ZeRO #28
DeepSpeed,看上去只是个pytorch封装,介绍上说让distributed training更加easy,有趣,可以研究下怎么easy了. https://github.com/microsoft/DeepSpeed
https://github.com/microsoft/DeepSpeed/blob/master/csrc/fused_lamb_cuda_kernel.cu 卧槽,他们居然把LAMB #3 用cuda加速了
暂时没paper release,等后续
不过blog已经给了不少细节了 https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
reddit: https://www.reddit.com/r/MachineLearning/comments/f1tuv0/r_turingnlg_a_17billionparameter_language_model/
用了ZeRO #28
DeepSpeed,看上去只是个pytorch封装,介绍上说让distributed training更加easy,有趣,可以研究下怎么easy了. https://github.com/microsoft/DeepSpeed