Fail to run server with prefix-caching option

System Info

ghcr.io/predibase/lorax:a8ca5cb
Ubuntu 20.04
GPU A10G

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

docker run --gpus 1 -v ./data:/data -p 8005:80 ghcr.io/predibase/lorax:a8ca5cb \
  --prefix-caching true \
  --port 80 \
  --model-id Open-Orca/Mistral-7B-OpenOrca \
  --cuda-memory-fraction 0.8 \
  --sharded false \
  --max-waiting-tokens 20 \
  --max-input-length 4096 \
  --max-total-tokens 8192 \
  --hostname 0.0.0.0 \
  --max-concurrent-requests 512 \
  --max-best-of 1  \
  --max-batch-prefill-tokens 4096 \
  --max-active-adapters 10 \
  --adapter-source local \
  --adapter-cycle-time-s 2 \
  --json-output \
  --disable-custom-kernels \
  --dtype float16

Expected behavior

The server starts successfully and the prefix-caching works well

predibase / lorax

Fail to run server with prefix-caching option #599

System Info

Information

Tasks

Reproduction

Expected behavior