hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
MIT License
1.29k stars 81 forks source link

Possible to run on M-series chips/MPS? #173

Closed fakerybakery closed 1 year ago

fakerybakery commented 1 year ago

Hello, Thank you for making this great repository! Is it possible to run this on M1/M2 chips using MPS? I've tried setting self.device to mps, however I get this:

RuntimeError: Placeholder storage has not been allocated on MPS device!

Is there any way to run this using MPS optimization? Thank you!

peakji commented 1 year ago

I haven't used MPS before, will investigate it.

fardeon commented 1 year ago

After some attempts, I found that setting device="mps" alone is not enough, and we also need an additional model.to("mps") to run.

What's even stranger is that the inference speed (of distillgpt2) actually decreases significantly when using MPS compared to using CPU (Macbook Pro M1).

fakerybakery commented 1 year ago

OK, thank you. I will try that out.