microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

PromptCompressor -- Missing Package Accelerate but it is installed #85

Closed shannonlal closed 5 months ago

shannonlal commented 5 months ago

I am trying to run the Retrieval Notebook and when I get an error about accelerate not being installed; however, as you can see from my logs below it is installed

Notebook: https://github.com/microsoft/LLMLingua/blob/main/examples/Retrieval.ipynb

# Setup LLMLingua
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor()

I get the following error

[/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   2672                 )
   2673             elif not is_accelerate_available():
-> 2674                 raise ImportError(
   2675                     "Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`"
   2676                 )

ImportError: Using `low_cpu_mem_usage=True` or a `device_map` **requires Accelerate: `pip install accelerate`**

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

Machine: Colab with I have tried running this on colab with a L4 GPU.

Library Versions: Here are the version of my library. I did a !pip show

!pip show transformers
!pip show sentence_transformers
!pip show llmlingua 
!pip show accelerate
!pip show cohere

Version Output

Name: transformers
Version: 4.35.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: [transformers@huggingface.co](mailto:transformers@huggingface.co)
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: llmlingua, sentence-transformers
Name: sentence-transformers
Version: 2.3.1
Summary: Multilingual text embeddings
Home-page: https://www.sbert.net/
Author: Nils Reimers
Author-email: [info@nils-reimers.de](mailto:info@nils-reimers.de)
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: huggingface-hub, nltk, numpy, Pillow, scikit-learn, scipy, sentencepiece, torch, tqdm, transformers
Required-by: 
Name: llmlingua
Version: 0.1.5
Summary: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Home-page: https://github.com/microsoft/LLMLingua
Author: The LLMLingua team
Author-email: [hjiang@microsoft.com](mailto:hjiang@microsoft.com)
License: MIT License
Location: /usr/local/lib/python3.10/dist-packages
Requires: nltk, numpy, tiktoken, torch, transformers
Required-by: 
Name: accelerate
Version: 0.27.2
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: [sylvain@huggingface.co](mailto:sylvain@huggingface.co)
License: Apache
Location: /usr/local/lib/python3.10/dist-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: 
Name: cohere
Version: 4.47
Summary: Python SDK for the Cohere API
Home-page: 
Author: Cohere
Author-email: 
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: aiohttp, backoff, fastavro, importlib_metadata, requests, urllib3
Required-by: llmx

I also can confirm that I have a GPU on the box

import torch
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
print("Device: ",device)
if use_cuda:
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__CUDA Device Name:',torch.cuda.get_device_name(0))
    print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

Output


__CUDNN VERSION: 8902
__Number CUDA Devices: 1
__CUDA Device Name: NVIDIA L4
__CUDA Device Total Memory [GB]: 23.58378496

Is anyone else seeing this? Does anyone know how to solve this?

iofu728 commented 5 months ago

Apologies for the delayed response. The issue stems from the 'accelerate' library.

To resolve it, you should restart your session after installing 'accelerate.' For further details, please refer to this solution on Stack Overflow: https://stackoverflow.com/questions/76902752/importerror-using-low-cpu-mem-usage-true-or-a-device-map-requires-accelerat.