volcengine / veScale

A PyTorch Native LLM Training Framework
http://vescale.xyz
Apache License 2.0
553 stars 26 forks source link

[DTensor] Open Source #8

Closed leonardo0lyj closed 5 months ago

leonardo0lyj commented 5 months ago

In this PR, we open source our DTensor and Dockerfile, Yo~

veScale is a PyTorch native framework rooted in PyTorch DTensor. veScale DTensor shares the majority of code of PyTorch DTensor, but extends it with extra features as below (i.e., major differences from PyTorch DTensor v2.2.0) for our production usage:

Credit to veScale DTensor Team

This endeavor would not have been possible without the contribution of our DTensor team which includes but not limited to: @Vremold @JsBlueCat @SerailHydra @wenlei-bao @Hao-Gong @jc-bytedance @yaochengji @Connor-XY @cheimu @MackZackA @leonardo0lyj.

Also thanks to the great guidance and leadership of: @liwenchangbdbz @pengyanghua @eric-haibin-lin @Meteorix

Credit to PyTorch DTensor Team

We would like to sincerely acknowledge the assistance of and collaboration with the PyTorch DTensor team which includes but not limited to: @wanchaol @XilunWu @wz337 @tianyu-l @fduwjj @awgu @yifuwang @wconstab @ezyang @mrshenli.

yiakwy-xpu-ml-framework-team commented 5 months ago

Integrated pytorch distributed tensor and device mesh into a large scale parallel framework like megatron is awesome. Good job!