microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
694 stars 84 forks source link

fix(fast_dispatch): saving input tensor using ctx.save_for_backward #238

Closed KimmiShi closed 2 months ago

KimmiShi commented 2 months ago

fix #237

ghostplant commented 2 months ago

LGTM. Thanks!