jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
44 stars 3 forks source link

Source Code Reading - DeepSpeed #29

Open jasperzhong opened 4 years ago

jasperzhong commented 4 years ago

暂时没paper release,等后续

不过blog已经给了不少细节了 https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/

reddit: https://www.reddit.com/r/MachineLearning/comments/f1tuv0/r_turingnlg_a_17billionparameter_language_model/

用了ZeRO #28

DeepSpeed,看上去只是个pytorch封装,介绍上说让distributed training更加easy,有趣,可以研究下怎么easy了. https://github.com/microsoft/DeepSpeed

jasperzhong commented 4 years ago

https://github.com/microsoft/DeepSpeed/blob/master/csrc/fused_lamb_cuda_kernel.cu 卧槽,他们居然把LAMB #3 用cuda加速了