DLRover: An Automatic Distributed Deep Learning System
Other
1.22k
stars
153
forks
source link
We are going to build a LLM training agent help searching training strategy and babysitting model training to maxmimize MFU and effective training time. #1005
So this is a proposal with community ? Where is your RFC ?