hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
MIT License
1.29k stars 80 forks source link

how to run model in total offline? #109

Closed gitknu closed 1 year ago

gitknu commented 1 year ago

Sorry for stupid question, but I am totally newbie in Docker and using Huggingface locally (not via colab or something else). This is the command to model for the first time , for example:

docker run -p 80:80 -e MODEL=bigscience/bloom-560m hyperonym/basaran:0.13.5

In this case everything is incredible! All works, I turn of the connection and still everything works fine.

Now, when I want to use the previously downloaded model, I have difficulties. Can you just give and example of offline run?

Something like this, without using Dockerfile etc. Just one command: docker run -p 80:80 -e TRANFORMERS_OFFLINE=1 MODEL='/home/my_model' hyperonym/basaran:0.13.5

And please, don't send this. I've tried a lot of different variations, but still didn't get it: https://huggingface.co/docs/transformers/v4.15.0/installation#offline-mode

So, shortly: how to run basaran in Docker locally, give an example of command, please.

Thank you for your understanding and for your help and work!

peakji commented 1 year ago

There are several ways to run Basaran locally using Docker, and the simplest and most portable way is to create a bundled image: By creating a new Dockerfile to pre-download the model and package it into a new image, the bundled image can be run offline locally.

Taking bloomz-560m as an example, you can download the pre-written Dockerfile and then run:

docker build --tag basaran:bloomz-560m -f ./bloomz-560m.Dockerfile .

The new image (basaran:bloomz-560m) is what you need, as it embeds the model.

gitknu commented 1 year ago

Thank you! Everything works now great!

In case you can recommend any ChatGPT alike model for low-end PC - thank you! (because everything I ran before was text completion - bloomz-1b1, codegen etc.; and the Alpaca-native-7B doesn't have config.json so it didn't run). If not - nevertheless, you do a great job!

peakji commented 1 year ago

ChatGLM-6B works pretty well (for English and Chinese) on commodity hardware, and LLaMA/Alpaca support will be added in the next minor release!

gitknu commented 1 year ago

Thanks! I will try it today))