microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://llmlingua.com/
MIT License
4.27k stars 228 forks source link

[Question]: Why set "cache_dir" to "/tmp/cache" on macOS when passing mps as device_map? #98

Open danny-su opened 4 months ago

danny-su commented 4 months ago

Describe the issue

Why set "cache_dir" to "/tmp/cache" on macOS when passing mps as device? This setting creates double caches. Most of my applications use $HOME/.cache/huggingface/pub as the cache directory.

image image image
iofu728 commented 4 months ago

Hi @danny-su, this design is intended for Linux machines. We plan to remove it soon.

danny-su commented 4 months ago

@iofu728 I want it to run on macOS. It seems that PyTorch doesn't support BF16 on macOS for now, even though macOS Sonoma has supported BF16. Therefore, it is better to allow the users to pass in "mps" as a device and torch.float32 as torch_dtype.

iofu728 commented 4 months ago

Hi @danny-su, thank you for your suggestion. We will add a custom torch_dtype.