sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
5.9k stars 477 forks source link

Mistral model no longer loads following PR#101 #107

Closed johndun closed 9 months ago

johndun commented 9 months ago

The get_model_cls_by_arch_name introduced in Dynamic model class loading PR removes the hard-coded mapping between MistralForCausalLM and LlamaForCausalLM causing issues trying to local host Mistral-7b model as of sglang version 0.1.9. I have tested that adding the following simple models/mistral.py file allows hosting the mistral-7b model.

from sglang.srt.models.llama2 import LlamaForCausalLM

class MistralForCausalLM(LlamaForCausalLM):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

EntryClass = MistralForCausalLM
comaniac commented 9 months ago

Thanks for pointing out this issue and the workaround. I'll take a look today.

comaniac commented 9 months ago

Ok it turns out that we should do exactly what you proposed. Mistral config does use MistralForCausalLM, so we should look for this class instead of using a hard-coded mapping. I'll file a PR for it now and make you a co-author. Thanks!