Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
MIT License
1.29k
stars
81
forks
source link
Loading basaran on multiple gpus leads to error #280
Getting error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
I am running basaran with default params and llama 2 model.