When launching the server, I got this error, I guess it's because I only got Pytorch with CPU. So the sample you shared must be run on a GPU enabled machine?
17:42 ~/dEV/chat_service [homt]$ python3 server/chat_server.py
Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/stock/dEV/chat_service/server/chat_server.py", line 34, in
pipe = pipeline("text-generation", model=config.model_id, torch_dtype=torch.bfloat16,
File "/home/stock/.virtualenvs/speechless/lib/python3.10/site-packages/transformers/pipelines/init.py", line 870, in pipeline
framework, model = infer_framework_load_model(
File "/home/stock/.virtualenvs/speechless/lib/python3.10/site-packages/transformers/pipelines/base.py", line 269, in infer_framework_load_model
model = model_class.from_pretrained(model, kwargs)
File "/home/stock/.virtualenvs/speechless/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/home/stock/.virtualenvs/speechless/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
) = cls._load_pretrained_model(
File "/home/stock/.virtualenvs/speechless/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4116, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/stock/.virtualenvs/speechless/lib/python3.10/site-packages/transformers/modeling_utils.py", line 778, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, set_module_kwargs)
File "/home/stock/.virtualenvs/speechless/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
new_value = value.to(device)
File "/home/stock/.virtualenvs/speechless/lib/python3.10/site-packages/torch/cuda/init.py", line 289, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
(speechless) 17:42 ~/dEV/chat_service [homt]$
When launching the server, I got this error, I guess it's because I only got Pytorch with CPU. So the sample you shared must be run on a GPU enabled machine?