microsoft / Tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
724 stars 93 forks source link

fix a few casts #209

Closed vchiley closed 1 year ago

vchiley commented 1 year ago

This PR

  1. removes to(*args) fn from FusedExpertsNetwork
    • The layer imlp does not have self.fc1_weight, self.fc2_weight, self.fc1_bias, or self.fc2_bias.
    • This to(*args) fn is a bug.
    • At one point, this would have been correct, but it is no longer correct.
  2. autocasts x in expert fn
    • p = list[experts.parameters()][0] will have dtype fp32 if using generic torch autocast; adding autocast to x if autocast enabled.
  3. better cast x in gate
ghostplant commented 1 year ago

Nice fixes, Thanks so much!