huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.78k stars 1.02k forks source link

Add support for Mistral-Nemo #2252

Open shaltielshmid opened 2 months ago

shaltielshmid commented 2 months ago

Model description

This model was released by Mistral here, and is available on HuggingFace here. The model is meant to be a drop-in replacement for Mistral-7B, but requires some modifications to handle the use of the tekken tokenizer and the explicit definition of head-dim (see here). I've tried copying in the head-dim change into the code, but then the model's output was pure garbage, so I assume there's something else that I'm missing.

Open source status

Provide useful links for the implementation

No response

ErikKaum commented 2 months ago

Hi @shaltielshmid 👋

Thanks for the request! I agree that getting support for the Mistral-Nemo model would be nice 👍 At the moment we might not have bandwidth to jump on it directly.

If you have even a half ready PR please feel free to open it e.g. as a draft. I think that would help us and speed things a bit.

Btw, just as a side note. The recommended temperature for Mistral Nemo is 0.3, so good to set that as a default as well when testing.

shaltielshmid commented 2 months ago

Hi @ErikKaum

I opened the half ready PR, as you suggested. The model is loaded, but the results are gibberish.

Let me know if I can be of assistance going further.

ErikKaum commented 2 months ago

Thank you 🙌 this already helps a lot.

At the moment I don't think there much else to do than continue debugging why the model gives gibberish so if you have bandwidth just go ahead 👍

tensimixt commented 2 months ago

@ErikKaum once NeMO is able to run through TGI, although the vocab size is > 130k, do you know if TGI will work with NeMO+ LoRA Adapter?

Thank you!

ErikKaum commented 1 month ago

Hi @tensimixt!

Sorry for a slow response. At the top of my head I can't come up with a reason why it wouldn't work 🤔

Btw, just to verify, was this issue resolved through the #2254 PR or is this still valid?