microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
694 stars 84 forks source link

add reset_parameters fn; updt .to() fn; enable device and dtype pass thru #194

Closed vchiley closed 1 year ago

vchiley commented 1 year ago

This PR

  1. adds reset_parameters fn to FusedExpertNetwork.
  2. removes to(*args) fn from FusedExpertsNetwork
    • The layer imlp does not have self.fc1_weight, self.fc2_weight, self.fc1_bias, or self.fc2_bias.
    • This to(*args) fn is a bug.
    • At one point, this would have been correct, but it is no longer correct.
  3. Also propagating device=device and dtype=dtype to parameter init
vchiley commented 1 year ago

@microsoft-github-policy-service agree company="MosaicML"