Support for Jamba (Mamba and Transformers based) model

fhkingma commented 3 months ago

Model description

A hybrid Mamba (SSM) and Transformers based model is published at scale, with a MoE architecture, with 12B active parameters and 52B parameters total. Claiming to be on par with Mixtral on several evaluation tasks. Because of the SSM-based design it can process a 256K token context window, and claiming to fit 140K token context on a single GPU (not mentioning what kind of GPU though). While it’s now still a foundational model, it might be interesting to have a look at how this can be implemented for efficient inference. The model is available on HF.

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

HF repo: https://huggingface.co/ai21labs/Jamba-v0.1

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

huggingface / text-generation-inference