lyogavin / airllm

AirLLM 70B inference with single 4GB GPU
Apache License 2.0
5.28k stars 423 forks source link

Compression does not work with MLX / Apple Silicon #177

Open sammcj opened 2 months ago

sammcj commented 2 months ago

I can't find a way to get compression working with MLX / Apple Silicon.

When Airllm uses bitsandbytes it seems to want to load CUDA rather than the installed CPU/MLX version.

Looking into Bitsandbytes - there is a rewrite effort ongoing to make it work with Apple Silicon (https://github.com/bitsandbytes-foundation/bitsandbytes/issues/252 & https://github.com/huggingface/transformers/pull/31098).

Is there a method other than bitsandbytes available?