mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.52k stars 361 forks source link

Metal Support #18

Closed jordo1138 closed 11 months ago

jordo1138 commented 11 months ago

raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

Any plans to get Metal support for us M2 users without CUDA? Thanks!

Guangxuan-Xiao commented 11 months ago

Can you give me your full stack trace? Did you install the correct PyTorch for M2 Mac?

jordo1138 commented 11 months ago

Sure can give it soon, it's true the default command in your setup environment section of the readme will not give a compiled for CUDA torch because PyTorch requires M1/M2 to compile from source if they want CUDA because they know Apple Silicon doesn't support it so the pip package doesn't have it. Are you suggesting it doesn't need to use CUDA but needs PyTorch to be compiled with it? Like in your run example cuda_visible_devices=0 which implies its off anyhow?

edit: @Guangxuan-Xiao added trace - as suspected it's not complied with CUDA (which non of the Torch packages are by default for Apple Silicon. AFAIK I can try to compile from Torch Source.

ASSISTANT: Traceback (most recent call last): File "examples/run_streaming_llama.py", line 122, in <module> main(args) File "examples/run_streaming_llama.py", line 103, in main streaming_inference( File "/Users/hamel/anaconda3/envs/streaming/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "examples/run_streaming_llama.py", line 67, in streaming_inference input_ids = input_ids.to("cuda") File "/Users/hamel/anaconda3/envs/streaming/lib/python3.8/site-packages/torch/cuda/__init__.py", line 289, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

jordo1138 commented 11 months ago

I think both of the example .py files are importing torch expecting cuda device support. Perhaps in future I can look into example that would use Metal instead Thanks `import torch from tqdm import tqdm import os from transformers import AutoModelForCausalLM, AutoTokenizer from datasets import load_dataset from torch.nn import CrossEntropyLoss from streaming_llm.kv_cache import StartRecentKVCache from streaming_llm.utils import parse_args, load

device = "cuda"`

jordo1138 commented 11 months ago

oh good news I fixed it, in the File "examples/run_streaming_llama.py", line 67, in streaming_inference need to change from "cuda" to "mps" for metal performance shaders

Working now thanks. I think it needs optimization to really work properly for the memory management and to utilize all the cores too. But it does run. Could also be that I need a smaller model like llama 7b.
Thanks

jordo1138 commented 11 months ago

leaving this in-case other M2 users come by to check for support. Modify the .py example from "cuda" -> "mps"

streamingllm example changed from cuda to mps support
tomaarsen commented 11 months ago

You can use input_ids.to(model.device) to make it work regardless of whether you have mps, cuda or even something else.

jordo1138 commented 11 months ago

in this case I got a failure from torch expecting an explicit type. Edit: thanks that works, I had used just 'device' but 'model.device' was required and works.

On Thu, Oct 5, 2023 at 5:22 AM Tom Aarsen @.***> wrote:

You can use input_ids.to(model.device) to make it work regardless of whether you have mps, cuda or even something else.

— Reply to this email directly, view it on GitHub https://github.com/mit-han-lab/streaming-llm/issues/18#issuecomment-1748787092, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBBNSEKOS2LROGP4G447EDX52Q7BAVCNFSM6AAAAAA5TLE4B2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBYG44DOMBZGI . You are receiving this because you authored the thread.Message ID: @.***>