turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.67k stars 214 forks source link

Question about storing models in Container #214

Open JacobGoldenArt opened 11 months ago

JacobGoldenArt commented 11 months ago

Hi, Sorry if this is obvious : ) but, I'm trying to build the Docker container. It says to "First, set the MODEL_PATH and SESSIONS_PATH variables in the .env file to the actual directories on the host." What I want to do is build the container with one or a few models stored in the container, then run the container on a cloud gpu. So in that case, what should I put as the MODEL_PATH and SESSION_PATH, can I just create a /model directory in the container and story the models in there and then just point the MODEL_PATH var ro /models/(my downloaded model)

APPLICATION_STATE_PATH=/data  # path to the directory holding application state inside the container
MODEL_PATH=F"models/{MY_MODEL}"  # replace with the actual model path on the host
SESSIONS_PATH=~/exllama_sessions  # replace with the actual directory on the host where chat sessions should be stored
turboderp commented 11 months ago

I'm sorry I really don't know anything about docker. @nopperl did the Docker stuff, maybe they can help?

nopperl commented 11 months ago

@JacobGoldenArt in the provided docker compose setup, the model is not stored in the container! Instead a host directory is mounted into the container. Also, exllama expects the directory to contain a single model instead of multiple models.

So, in your case, you could have a /models directory on the host which contains all your models. You would then start the container with a specific model (e.g. MODEL_PATH=/models/LLaMA-7B-4bit-128g). If you want to switch to a different model, restart the container with a different MODEL_PATH.