google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Apache License 2.0
21 stars 12 forks source link

Mixtral enablement. #120

Closed wang2yn84 closed 1 month ago

wang2yn84 commented 1 month ago

Mixtral 8x7b model is working for both offline and online, bf16 and int8. Let's get this in first so we can parallelize the work. Will add tests in the coming PRs.

qihqi commented 1 month ago

please make sure the name is mixtral and not mistral. We might add mistral 7b ( the non-Moe version) later, so it would be confusing