intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.26k stars 166 forks source link

xpu timer python package #1159

Open zxyyzx opened 4 months ago

zxyyzx commented 4 months ago

I am attempting to learn and utilize the xpu timer as described in the following article: 故障排查难?xpu_timer 让大模型训练无死角! https://mp.weixin.qq.com/s/OYkv4gXh_l_HpHXHqK6Ijw This article references a Python package shown in the image below, but I could not find any information about this package. Is it open-sourced? If so, how can I install it?

image
cos120 commented 4 months ago

We will release xpu_timer in this month.

issaccv commented 2 months ago

Any updates on this issue?

dafu-wu commented 1 month ago

@cos120 already release?

aqwertaqwert commented 2 weeks ago

@cos120 already release?