Open prd-tuong-nguyen opened 1 month ago
docker run --gpus 1 -v ./data:/data -p 8005:80 ghcr.io/predibase/lorax:a8ca5cb \ --prefix-caching true \ --port 80 \ --model-id Open-Orca/Mistral-7B-OpenOrca \ --cuda-memory-fraction 0.8 \ --sharded false \ --max-waiting-tokens 20 \ --max-input-length 4096 \ --max-total-tokens 8192 \ --hostname 0.0.0.0 \ --max-concurrent-requests 512 \ --max-best-of 1 \ --max-batch-prefill-tokens 4096 \ --max-active-adapters 10 \ --adapter-source local \ --adapter-cycle-time-s 2 \ --json-output \ --disable-custom-kernels \ --dtype float16
The server starts successfully and the prefix-caching works well
@tgaddair Hi bro, any update on this?
System Info
Information
Tasks
Reproduction
Expected behavior
The server starts successfully and the prefix-caching works well