pentium3 / sys_reading

system paper reading notes
235 stars 12 forks source link

Lancet: Accelerating Mixture-of-Experts Training by Overlapping Weight Gradient Computation and All-to-All Communication #356

Open pentium3 opened 8 months ago