intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.22k stars 153 forks source link

xpu timer python package #1159

Open zxyyzx opened 3 months ago

zxyyzx commented 3 months ago

I am attempting to learn and utilize the xpu timer as described in the following article: 故障排查难?xpu_timer 让大模型训练无死角! https://mp.weixin.qq.com/s/OYkv4gXh_l_HpHXHqK6Ijw This article references a Python package shown in the image below, but I could not find any information about this package. Is it open-sourced? If so, how can I install it?

image
cos120 commented 3 months ago

We will release xpu_timer in this month.

issaccv commented 3 weeks ago

Any updates on this issue?

dafu-wu commented 1 week ago

@cos120 already release?