huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.05k stars 27.02k forks source link

Fix MoE tensor reshape #33738

Open sukjunhwang opened 1 month ago

sukjunhwang commented 1 month ago

Previous code transposes the tensor first, making it (B, L, E)->(B, E, L), then reshapes afterwards, leading to an incorrect tensor arrangement.

What does this PR do?

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

LysandreJik commented 1 month ago

Thank you! cc @ArthurZucker