`example/llm`: Re-evaluate open-webui environment variables

juspay / services-flake

NixOS-like services for Nix flakes

https://community.flake.parts/services-flake

MIT License

299 stars 28 forks source link

`example/llm`: Re-evaluate open-webui environment variables #230

Closed srid closed 1 month ago

srid commented 1 month ago

https://github.com/juspay/services-flake/blob/f360c8f2bc7e23c858dcb4eb9f597e8a91bba6d2/example/llm/flake.nix#L48-L58

Keep only environment variables (which were introduced in #227) that are strictly necessary, while leaving the rest commented out.

Consider the implications of DEVICE_TYPE = "cpu"; especially when GPU is enabled.

Our examples should a) "just work", be b) simple and minimal, c) well-documented (liberal use of comments, for example).

shivaraj-bh commented 1 month ago

We needn’t have to assign DEVICE_TYPE = “cpu” as it is ”cpu” by default, unless explicitly specified.

Also, this ENV doesn’t affect ollama using CPU or not, which will still be managed as documented., but it only affects how the embedding models used to run RAG pipelines are invoked.

shivaraj-bh commented 1 month ago

ENABLE_OLLAMA_API = "True”; is also redundant as it is true by default. It could probably be a comment so that the users know how to disable it, if they want to.

shivaraj-bh commented 1 month ago

OLLAMA_BASE_URL = "http://${host}:${toString port}”; is also redundant as it is derived from OLLAMA_API_BASE_URL by default.

shivaraj-bh commented 1 month ago

{
  RAG_EMBEDDING_ENGINE = "ollama"; 
  RAG_EMBEDDING_MODEL = "mxbai-embed-large:latest"; 
}

should be fine, since otherwise Open WebUI will use sentence-transformers to fetch the embedding models, which would require DEVICE_TYPE to choose where the embedding happens. If we rely on ollama instead, we can make use of already documented configuration to use GPU acceleration.

srid commented 1 month ago

should be fine, since otherwise Open WebUI will use sentence-transformers to fetch the embedding models, which would require DEVICE_TYPE to choose where the embedding happens. If we rely on ollama instead, we can make use of already documented configuration to use GPU acceleration.

This, verbatim, sounds like a good candidate for a comment on top of these env vars.

The rest can be either commented out or removed.