How do I make a model use mps?

marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

MIT License

1.8k stars 136 forks source link

How do I make a model use mps? #131

Open jmtayamada opened 1 year ago

jmtayamada commented 1 year ago

I'm working on a Mac with an M2 chip. I've installed ctransformers with metal support, and am setting up the model like below. However, when I check what device the model is using, it outputs cpu.

Am I not setting up the model to use mps properly?

wheynelau commented 1 year ago

I did this on m1 but I didn't use hf=True. Did you run any test and did you install pytorch using the metal instructions? Because by default, I tested that installing ctransformers with mps support does not install pytorch.

jmtayamada commented 1 year ago

I did this on m1 but I didn't use hf=True. Did you run any test and did you install pytorch using the metal instructions? Because by default, I tested that installing ctransformers with mps support does not install pytorch.

To use the tokenizer, hf has to equal True. Also, I've installed PyTorch with mps support and have checked using print(torch.backends.mps.is_available())

wheynelau commented 1 year ago

I might have made a mistake, you are right.

Anyways I checked the code for the loading for huggingface models, it doesn't seem like there was any moving to device. Perhaps we can wait for one of the developer's answer.

netneko21 commented 11 months ago

Update on this?

pabvald commented 10 months ago

Same issue. Any Updates?

giovanniOfficioso commented 10 months ago

Hi I have the same issue. I can load the pipeline on maps, but I can't load the model on 'mps', but only a cpu. I followed these steps:

I installed ctransformers library in this way: CT_METAL=1 pip install ctransformers --no-binary ctransformers
I tried to load model on 'mps', but I had no results. I tried so: MODEL_NAME = "TheBloke/Llama-2-7B-32K-Instruct-GGUF" llama_model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, model_file="llama-2-7b-32k-instruct.Q5_K_S.gguf", model_type="llama", hf=True, gpu_layers = 50 )

But If I check the device llama_model.device I have device(type='cpu'). Also if I try llama_model = llama_model.to('mps'), if I check I have: device(type='cpu') Any suggestion here in order to fix this issue, please? Thank you