chatUI requests the llamacpp server by a hardcoded address

I am using the chatui instance as pod in k8s cluster with following params:

envVars:
  MONGODB_URL: mongodb://chatui-mongodb:27017
  HF_TOKEN: -----------------------------------
  MODELS: '[
    {
      "name": "Meta-Llama-3-8B-Instruct-q5_k_m.gguf",
      "endpoints": [{
        "type" : "llamacpp",
        "baseURL": "http://llama-cpp-server:8000"
      }],
    },
  ]'

llamacpp is launched with following params:

containers:
            - name: llama-cpp-server
              image: ghcr.io/ggerganov/llama.cpp:server-cuda
              imagePullPolicy: IfNotPresent
              args: ["-m", "/models/Meta-Llama-3-8B-Instruct-q5_k_m.gguf", "--port", "8000", "--host", "0.0.0.0", "-n", "512", "--n-gpu-layers", "1"]

Even though baseURL ": "http://llama-cpp-server:8000", so chatUI should request this address, I have following error when trying to prompt something on frontend:

{"level":50,"time":1719306384776,"pid":22,"hostname":"chatui-7f75bfc479-zfqbg","err":{"type":"TypeError","message":"fetch failed: ","stack":"TypeError: fetch failed\n    at fetch (file:///app/build/shims.js:20346:13)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Promise.all (index 1)\n    at async Promise.all (index 0)\n    at async file:///app/build/server/chunks/index3-1a1c67bb.js:388:17\ncaused by: AggregateError [ECONNREFUSED]: \n    at internalConnectMultiple (node:net:1117:18)\n    at internalConnectMultiple (node:net:1185:5)\n    at afterConnectMultiple (node:net:1684:7)"},"msg":"Failed to initialize PlaywrightBlocker from prebuilt lists"}
{"level":50,"time":1719306387338,"pid":22,"hostname":"chatui-7f75bfc479-zfqbg","err":{"type":"TypeError","message":"fetch failed: connect ECONNREFUSED 127.0.0.1:8080","stack":"TypeError: fetch failed\n    at fetch (file:///app/build/shims.js:20346:13)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///app/build/server/chunks/models-fc8a6ecf.js:98900:15\n    at async generateFromDefaultEndpoint (file:///app/build/server/chunks/index3-1a1c67bb.js:213:23)\n    at async generateTitle (file:///app/build/server/chunks/_server.ts-3da96c1b.js:213:10)\n    at async generateTitleForConversation (file:///app/build/server/chunks/_server.ts-3da96c1b.js:177:19)\ncaused by: Error: connect ECONNREFUSED 127.0.0.1:8080\n    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1606:16)"},"msg":"fetch failed"}
TypeError: fetch failed
    at fetch (file:///app/build/shims.js:20346:13)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///app/build/server/chunks/models-fc8a6ecf.js:98900:15
    at async generate (file:///app/build/server/chunks/_server.ts-3da96c1b.js:426:30)
    at async textGenerationWithoutTitle (file:///app/build/server/chunks/_server.ts-3da96c1b.js:487:3) {
  cause: Error: connect ECONNREFUSED 127.0.0.1:8080
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1606:16) {
    errno: -111,
    code: 'ECONNREFUSED',
    syscall: 'connect',
    address: '127.0.0.1',
    port: 8080
  }
}

AFAIK this means that chaui requests some predefined (or hardcoded) URL - 127.0.0.1:8080 to get predictions. My humble opinion is that chatUI ignores baseURL if type is llamacpp and uses 127.0.0.1:8080 as default one Is it predictable behaviour?

ps: My guess is that problem is here

huggingface / chat-ui

chatUI requests the llamacpp server by a hardcoded address #1303