microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.42k stars 241 forks source link

AssertionError: Torch not compiled with CUDA enabled #55

Open JiHa-Kim opened 7 months ago

JiHa-Kim commented 7 months ago

Hi I tried to run LLMLingua using dolphin-2.6-phi-2 but I got AssertionError: Torch not compiled with CUDA enabled

PS C:\Users\DefaultUser> python "C:\Users\Public\Coding\LLMLingua\LLMLingua_test1.py"
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards:   0%|                                                                 | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\Public\Coding\LLMLingua\LLMLingua_test1.py", line 12, in <module>
    llm_lingua = LocalPromptCompressor()
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 27, in __init__
    self.load_model(model_name, device_map, model_config)
  File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 57, in load_model
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\transformers\models\auto\auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\transformers\modeling_utils.py", line 3706, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\transformers\modeling_utils.py", line 4116, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\transformers\modeling_utils.py", line 778, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "C:\Program Files\Lib\site-packages\accelerate\utils\modeling.py", line 347, in set_module_tensor_to_device
    new_value = value.to(device)
                ^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\torch\cuda\__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
samvanity commented 7 months ago

uninstall your pytorch first : !pip uninstall torch torchvision torchaudio -y

and then reinstall it by going to the pytorch official website to get the command

Avkashhirpara commented 7 months ago

Facing the same, tried uninstalling and installing the torch but error is the same.

I do not have the nvidia gpu or cuda platform in my PC. is there a chance for me to run it without gpu or cuda platform ?

JiHa-Kim commented 7 months ago

uninstall your pytorch first : !pip uninstall torch torchvision torchaudio -y

and then reinstall it by going to the pytorch official website to get the command

Thanks I re-installed and it worked, but I encountered another error:

Enter your contexts:  Test
Enter your question: What is in the context^
Traceback (most recent call last):
  File "C:\Users\Public\Coding\LLMLingua\LLMLingua_test1.py", line 44, in <module>
    compressed_prompt = llm_lingua.compress_prompt(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 252, in compress_prompt
    context = self.iterative_compress_prompt(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 761, in iterative_compress_prompt
    self_loss, self_past_key_values = self.get_ppl(
                                      ^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 105, in get_ppl
    response = self.model(
               ^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DefaultUser\.cache\huggingface\modules\transformers_modules\cognitivecomputations\dolphin-2_6-phi-2\a084bb141f99f67e8ff56a654e29ddd53a0b4d7a\modeling_phi.py", line 960, in forward
    hidden_states = self.transformer(input_ids, past_key_values=past_key_values, attention_mask=attention_mask)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DefaultUser\.cache\huggingface\modules\transformers_modules\cognitivecomputations\dolphin-2_6-phi-2\a084bb141f99f67e8ff56a654e29ddd53a0b4d7a\modeling_phi.py", line 919, in forward
    hidden_states = self.embd(input_ids)
                    ^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DefaultUser\.cache\huggingface\modules\transformers_modules\cognitivecomputations\dolphin-2_6-phi-2\a084bb141f99f67e8ff56a654e29ddd53a0b4d7a\modeling_phi.py", line 78, in forward
    input_ids = input_ids.view(-1, input_shape[-1])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

I ran it another time and got another error:

Enter your contexts: Test
Enter your question: What is in the context?
Traceback (most recent call last):
  File "C:\Users\Public\Coding\LLMLingua\LLMLingua_test1.py", line 44, in <module>
    compressed_prompt = llm_lingua.compress_prompt(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 252, in compress_prompt
    context = self.iterative_compress_prompt(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 804, in iterative_compress_prompt
    threshold = self.get_estimate_threshold_base_distribution(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Lib\site-packages\llmlingua\local_prompt_compressor.py", line 624, in get_estimate_threshold_base_distribution
    ppl.sort(descending=not condition_flag)
IndexError: index 0 is out of bounds for dimension 0 with size 0
samvanity commented 7 months ago

JiHa, try with a different LLM for the compressor like below:

from llmlingua import PromptCompressor llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})

You should try the notebook examples first to make sure it runs before writing your own code. Clone the repo first and do the notebook examples.

Avkashhirpara, if you don't have gpu, you change the device_map to "cpu". From the document:

from llmlingua import PromptCompressor

llm_lingua = PromptCompressor( model_name="NousResearch/Llama-2-7b-hf", # Default model device_map="cuda", # Device environment (e.g., 'cuda', 'cpu', 'mps') model_config={}, # Configuration for the Huggingface model open_api_config={}, # Configuration for OpenAI Embedding )

iofu728 commented 7 months ago

Thank you @samvanity for the clarification; that's correct. Hi @Avkashhirpara, you can switch the kernel environment using different 'device_map' settings by following @samvanity's.

Hi @JiHa-Kim, I think this error might be due to incorrect inputs. Could you provide more context about your case?

Avkashhirpara commented 7 months ago

JiHa, try with a different LLM for the compressor like below:

from llmlingua import PromptCompressor llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})

You should try the notebook examples first to make sure it runs before writing your own code. Clone the repo first and do the notebook examples.

Avkashhirpara, if you don't have gpu, you change the device_map to "cpu". From the document:

from llmlingua import PromptCompressor

llm_lingua = PromptCompressor( model_name="NousResearch/Llama-2-7b-hf", # Default model device_map="cuda", # Device environment (e.g., 'cuda', 'cpu', 'mps') model_config={}, # Configuration for the Huggingface model open_api_config={}, # Configuration for OpenAI Embedding )

Thanks @samvanity, it works for me now.

msankhala commented 3 months ago

If you are on Macbook, you have to change device_map="mps" in order to make it work.

llm_lingua = PromptCompressor(device_map="mps")

Source: https://stackoverflow.com/a/60619616/902102