vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.8k stars 4.68k forks source link

[Bug]: Authorization ignored when root_path is set #10531

Closed OskarLiew closed 10 hours ago

OskarLiew commented 4 days ago

Your current environment

The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ```

Model Input Dumps

No response

šŸ› Describe the bug

I was running vllm behind a route based proxy (traefik) and noticed that I could use the API without any token.

The problem seems to be because the API is still available on the default path /v1/.... and not just on /root_path/v1/.... but the key is only verified for root_path/v1/.... I was stripping the prefix, so I was hitting the /v1/... endpoint but needed to set the root path to be able to fetch the OpenAPI schema for swagger.

I was running the vllm/vllm-openai:v0.6.4 image

Here is a minimal example that reproduces the bug:

services:
  vllm:
    image: vllm/vllm-openai:v0.6.4
    ports:
      - 8000:8000
    environment:
      VLLM_API_KEY: ${VLLM_API_KEY:-secret-key}
    volumes:
      - $HOME/.cache/huggingface:/root/.cache/huggingface
    networks:
      - internal
    command: 
      - "--model=meta-llama/Llama-3.1-70B-Instruct"
      - "--tensor-parallel-size=2"
      - "--gpu-memory-utilization=0.95"
      - "--disable_log_requests"
      - "--root-path=/llm"
    restart: always
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Then the following request still works without authentication error

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-3.1-70B-Instruct",
        "messages": [
            {"role": "system", "content": "You are a helpful AI assistant"},
            {"role": "user", "content": "Hi"}
        ],
        "max_tokens": 10
    }'

Sending a request to localhost:8000/llm/v1/chat/completions works as expected

I've checked the source code and the issue seems to stem from the authentication middleware in vllm/entrypoints/openai/api_server.py:481

            if not request.url.path.startswith(f"{root_path}/v1"):
                return await call_next(request)

Before submitting a new issue...

OskarLiew commented 4 days ago

I was able to solve the issue by removing the prefix stripper, but I was caught off guard as to why authorization had stopped working

chaunceyjiang commented 3 days ago

Indeed, I can also reproduce this issue locally. @DarkLight1337 could you please confirm this issue as well?

I will try to fix this problem.

DarkLight1337 commented 3 days ago

Sorry, I don't have time to debug this. I can help review your PR though.

chaunceyjiang commented 1 day ago

@OskarLiew Hi, Can you please help me test #10606 again? I was able to solve this issue when testing it locally.