microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
694 stars 84 forks source link

removed logit_scale without device casting #191

Closed Harsh-Sensei closed 1 year ago

Harsh-Sensei commented 1 year ago

Fixed a small bug: logit_scale declared without casting to device of x can throw error.

ghostplant commented 1 year ago

Great. Thank you for your contribution!