Closed Gintasz closed 3 months ago
I also cannot use Mixtral with GPTQ and AWQ. The errors I typically get are (respectively):
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
and
/home/runner/work/vllm/vllm/csrc/quantization/awq/gemm_kernels.cu:46: void vllm::awq::gemm_forward_4bit_cuda_m16nXk32(int, int, __half *, int *, __half *, int *, int, int, int, __half *) [with int N = 128]: block: [25,0,0], thread: [18,1,0] Assertion `false` failed.
Here are the quantized models I tried:
@KMouratidis can you write a full list of commands how did you pull the weights of the mixtral model and passed them to sglang?
@KMouratidis can you write a full list of commands how did you pull the weights of the mixtral model and passed them to sglang?
I didn't, the server automatically downloaded the model for me when launched. The command I used was: python3 -m sglang.launch_server --model-path $MODEL --tp-size 8 --mem-fraction-static 0.85
@KMouratidis yeah and what was your $MODEL
value? Because I tried like this below and I got model path name error, as in original post:
python3 -m sglang.launch_server --model-path TheBloke/Mistral-7B-v0.1-GPTQ:gptq-4bit-32g-actorder_True --port 42069 --host 0.0.0.0
Any one of the 3 I mentioned in my first comment.
Mixtral AWQ works with https://huggingface.co/casperhansen/mixtral-instruct-awq on A100 80G
Sorry for a newb question, I don't find an answer. I succeeded in launching the server with unquantised Mistral7B:
python3 -m sglang.launch_server --model-path mistralai/Mistral-7B-Instruct-v0.2 --port 42069 --host 0.0.0.0
I'm trying to launch quantised model like this:
python3 -m sglang.launch_server --model-path TheBloke/Mistral-7B-v0.1-GPTQ:gptq-4bit-32g-actorder_True --port 42069 --host 0.0.0.0
I get error:
root@C.10082074:~$ python3 -m sglang.launch_server --model-path TheBloke/Mistral-7B-v0.1-GPTQ:gptq-4bit-32g-actorder_True --port 42069 --host 0.0.0.0 Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file resolved_file = hf_hub_download( File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 164, in validate_repo_id raise HFValidationError( huggingface_hub.utils._validators.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'TheBloke/Mistral-7B-v0.1-GPTQ:gptq-4bit-32g-actorder_True'.
Specifying the revision like this won't work. The revision is set to None in the model loading code of sglang.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.10/site-packages/sglang/launch_server.py", line 11, in
launch_server(server_args, None) File "/opt/conda/lib/python3.10/site-packages/sglang/srt/server.py", line 430, in launch_server tokenizer_manager = TokenizerManager(server_args, port_args) File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 93, in init self.hf_config = get_config( File "/opt/conda/lib/python3.10/site-packages/sglang/srt/hf_transformers_utils.py", line 33, in get_config config = AutoConfig.from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1111, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 633, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 688, in _get_config_dict resolved_config_file = cached_file( File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 462, in cached_file raise EnvironmentError( OSError: Incorrect path_or_model_id: 'TheBloke/Mistral-7B-v0.1-GPTQ:gptq-4bit-32g-actorder_True'. Please provide either the path to a local folder or the repo_id of a model on the Hub. I tried to download the repository locally and then specify directory as path:
git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ python3 -m sglang.launch_server --model-path Mistral-7B-v0.1-GPTQ --port 42069 --host 0.0.0.0
But then I get
Rank 0: load weight begin. quant_config: GPTQConfig(weight_bits=4, group_size=32, desc_act=True) Process Process-1: Traceback (most recent call last): router init state: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/router/manager.py", line 68, in start_router_process model_client = ModelRpcClient(server_args, port_args) File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 606, in init self.model_server.exposed_init_model(0, server_args, port_args) File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 62, in exposed_init_model self.model_runner = ModelRunner( File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/router/model_runner.py", line 275, in init self.load_model() File "/opt/conda/lib/python3.10/site-packages/sglang/srt/managers/router/model_runner.py", line 308, in load_model model.load_weights( File "/opt/conda/lib/python3.10/site-packages/sglang/srt/models/llama2.py", line 290, in load_weights for name, loaded_weight in hf_model_weights_iterator( File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/weight_utils.py", line 251, in hf_model_weights_iterator with safe_open(st_file, framework="pt") as f: safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
detoken init state: init ok
Cloning the model repo with the desired revision and loading the model from the local path works fine on my end with the exact same command you provided (sglang 0.1.13)
sglang works with Mixtral-8x7B-Instruct-v0.1-GPTQ on A100 in my test.
python -m sglang.launch_server --model-path ./Mixtral-8x7B-Instruct-v0.1-GPTQ/ --mem-fraction-static 0.9 --port 8501 --host 0.0.0.0 --context-length 8192 --trust-remote-code
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.
Sorry for a newb question, I don't find an answer. I succeeded in launching the server with unquantised Mistral7B:
I'm trying to launch quantised model like this:
I get error:
I tried to download the repository locally and then specify directory as path:
But then I get