thu-nics / MoA

The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
MIT License
80 stars 5 forks source link

AMD GPU #1

Closed DJ-Perico closed 2 months ago

DJ-Perico commented 3 months ago

Is there a way to either run this on an AMD GPU or run it on the CPU? I get this error now: /convert.py", line 829, in multi_round_qa_to_multi_round_qa_model_by_batch device = f"cuda:{(idx or 0) % torch.cuda.device_count()}" ZeroDivisionError: integer division or modulo by zero

fuvty commented 3 months ago

Could you please share your runtime configurations? This will help us understand and address your issue more effectively. Based on the information provided, it appears that the issue arises from the absence of an Nvidia GPU during the dataset conversion process. You can resolve this by specifying the device you are using. For instance, you can set device = 'cpu' if you are not using a GPU. Let us know if this doesn't fix the problem.

fuvty commented 3 months ago

@DJ-Perico We have also updated the repo to include pre-searched compression plans so that you can skip the search steps for the provided Vicuna model. Feel free to let us know if you have other problems. You are welcome to open PRs if minor modifications are needed to run on both Nvidia and AMD GPUs.