To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Traceback (most recent call last):
File "/home/user/./umitigate_citation.py", line 49, in <module>
llm_lingua = PromptCompressor()
File "/home/user/venv_openai3/lib/python3.10/site-packages/llmlingua/prompt_compressor.py", line 88, in __init__
self.load_model(model_name, device_map, model_config)
File "/home/user/venv_openai3/lib/python3.10/site-packages/llmlingua/prompt_compressor.py", line 139, in load_model
model = MODEL_CLASS.from_pretrained(
File "/home/user/venv_openai3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/home/user/venv_openai3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
) = cls._load_pretrained_model(
File "/home/user/venv_openai3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4116, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/user/venv_openai3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 778, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/user/venv_openai3/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 317, in set_module_tensor_to_device
new_value = value.to(device)
File "/home/user/venv_openai3/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Describe the issue
on CPU only instance