microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 226 forks source link

compress_prompt Reports Error: AttributeError: 'NoneType' object has no attribute 'device' #45

Open xxSpencer opened 6 months ago

xxSpencer commented 6 months ago

llm_lingua = PromptCompressor(model_name="Baichuan2-main/Baichuan2-7B-Chat", model_config=model_config) compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question="", target_token=200) Traceback (most recent call last): File "", line 1, in NameError: name 'prompt' is not defined compressed_prompt = llm_lingua.compress_prompt('你好', instruction="", question="", target_token=200) /root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization") Traceback (most recent call last): File "", line 1, in File "/PDF-OCR/AIGC/EngineeringQA/Baichuan2-main/LLMLingua/llmlingua/prompt_compressor.py", line 252, in compress_prompt context = self.iterative_compress_prompt( File "/PDF-OCR/AIGC/EngineeringQA/Baichuan2-main/LLMLingua/llmlingua/prompt_compressor.py", line 734, in iterative_compress_prompt loss, past_key_values = self.get_ppl( File "/PDF-OCR/AIGC/EngineeringQA/Baichuan2-main/LLMLingua/llmlingua/prompt_compressor.py", line 105, in get_ppl response = self.model( File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Chat/modeling_baichuan.py", line 719, in forward outputs = self.model( File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Chat/modeling_baichuan.py", line 494, in forward layer_outputs = decoder_layer( File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Chat/modeling_baichuan.py", line 306, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Chat/modeling_baichuan.py", line 238, in forward proj = self.W_pack(hidden_states) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 441, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 563, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, **kwargs) # type: ignore[misc] File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 344, in forward state.CxB, state.SB = F.transform(state.CB, to_order=formatB) File "/root/anaconda3/envs/baichuan2/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2080, in transform prev_device = pre_call(A.device) AttributeError: 'NoneType' object has no attribute 'device'

iofu728 commented 6 months ago

Hi @xxSpencer , by default, using LLMLingua requires NVIDIA CUDA to be enabled. You can switch to CPU mode with the following settings.

from llmlingua import PromptCompressor
llm_lingua = PromptCompressor(device_map="cpu")