lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.91k stars 4.55k forks source link

using --device=xpu #3152

Open tmatschewski opened 8 months ago

tmatschewski commented 8 months ago

If i use cpu i got the error that intel-extension-for-pytorch is missing If i use xpu i got this error

2024-03-14 06:15:45 2024-03-14 05:15:45 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=21002, worker_address='http://fastchat-model-worker:21002', controller_address='http://fastchat-controller:21001', model_path='lmsys/vicuna-7b-v1.5', revision='main', device='xpu', gpus=None, num_gpus=1, max_gpu_memory=None, dtype=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=['vicuna-7b-v1.5', 'gpt-3.5-turbo', 'text-davinci-003', 'text-embedding-ada-002'], conv_template=None, embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-03-14 06:15:45 2024-03-14 05:15:45 | INFO | model_worker | Loading the model ['vicuna-7b-v1.5', 'gpt-3.5-turbo', 'text-davinci-003', 'text-embedding-ada-002'] on worker 68ce88f9 ...
2024-03-14 06:15:45 2024-03-14 05:15:45 | ERROR | stderr | /usr/local/lib/python3.9/dist-packages/fastchat/model/model_adapter.py:246: UserWarning: Intel Extension for PyTorch is not installed, but is required for xpu inference.
2024-03-14 06:15:45 2024-03-14 05:15:45 | ERROR | stderr |   warnings.warn(
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:06<00:06,  6.86s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00,  4.32s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00,  4.70s/it]
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr | 
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr | /usr/local/lib/python3.9/dist-packages/transformers/generation/configuration_utils.py:410: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   warnings.warn(
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr | /usr/local/lib/python3.9/dist-packages/transformers/generation/configuration_utils.py:415: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   warnings.warn(
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr | /usr/local/lib/python3.9/dist-packages/transformers/generation/configuration_utils.py:410: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   warnings.warn(
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr | /usr/local/lib/python3.9/dist-packages/transformers/generation/configuration_utils.py:415: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   warnings.warn(
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr | Traceback (most recent call last):
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     return _run_code(code, main_globals, None,
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     exec(code, run_globals)
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/fastchat/serve/model_worker.py", line 375, in <module>
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     args, worker = create_model_worker()
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/fastchat/serve/model_worker.py", line 346, in create_model_worker
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     worker = ModelWorker(
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/fastchat/serve/model_worker.py", line 77, in __init__
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     self.model, self.tokenizer = load_model(
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/fastchat/model/model_adapter.py", line 362, in load_model
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     model.to(device)
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py", line 2556, in to
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     return super().to(*args, **kwargs)
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1152, in to
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     return self._apply(convert)
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 802, in _apply
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     module._apply(fn)
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 802, in _apply
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     module._apply(fn)
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 825, in _apply
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     param_applied = fn(param)
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |   File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1150, in convert
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr |     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2024-03-14 06:15:56 2024-03-14 05:15:56 | ERROR | stderr | RuntimeError: PyTorch is not linked with support for xpu devices

what could i do?

Steve-Tech commented 2 months ago

Did you mean to use intel-extension-for-pytorch? Like did you accidentally come across this, or do you have ipex installed and wish to use it?

Using --device cpu on it's own shouldn't require ipex, if you've added CPU_ISA=amx to the start then it will use the AI accelerator present on Intel CPUs which requires ipex. --device xpu requires a discrete GPU/AI accelerator from Intel, also requiring ipex.