BadRequestError on runsync route, or what is the correct method to hit handler.py's locally run API?

I'm getting a BadRequestError when I try to test the vllm worker locally.

I'm running my handler locally for testing, using MODEL_NAME=/models/stablelm-3b-4e1t python3 -u /src/handler.py --rp_serve_api --rp_api_port 8000 --rp_api_host 0.0.0.0, in a docker image built using the instructions found at https://github.com/runpod-workers/worker-vllm?tab=readme-ov-file#option-2-build-docker-image-with-model-inside, and I'm trying to send test requests to the runsync route based on what is described here:

https://blog.runpod.io/workers-local-api-server-introduced-with-runpod-python-0-9-13/

I've tried using the api test forms on the http://localhost:8000/docs page and I've also tried with curl:

curl -H 'content-type: application/json' -d '{"input":{"message":"blah de blah"}}' http://localhost:8000/runsync

However, I always get this response:

{
  "id": "test-1b8405d8-3e00-438e-b3cd-4bae73fc5e7a",
  "status": "COMPLETED",
  "output": [
    {
      "error": {
        "object": "error",
        "message": "",
        "type": "BadRequestError",
        "param": null,
        "code": 400
      }
    }
  ]
}

I also tried the {"input": {"number":123}} body shown in the blog post, same result.

What am I doing wrong?

Here's the full output from handler.py:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 04-12 23:19:33 llm_engine.py:87] Initializing an LLM engine with config: model='/models/stablelm-3b-4e1t', tokenizer='/models/stablelm-3b-4e1t', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir='/models/huggingface-cache/hub', load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 04-12 23:19:35 weight_utils.py:257] Loading safetensors took 1.01s
INFO 04-12 23:19:37 llm_engine.py:357] # GPU blocks: 1111, # CPU blocks: 819
WARNING 04-12 23:19:37 cache_engine.py:103] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
INFO 04-12 23:19:37 model_runner.py:684] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 04-12 23:19:37 model_runner.py:688] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 04-12 23:19:43 model_runner.py:756] Graph capturing finished in 7 secs.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 04-12 23:19:44 serving_chat.py:306] No chat template provided. Chat API will not work.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
--- Starting Serverless Worker |  Version 1.6.2 ---
INFO   | Starting API server.
DEBUG  | Not deployed on RunPod serverless, pings will not be sent.
INFO:     Started server process [252]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
DEBUG  | test-1b8405d8-3e00-438e-b3cd-4bae73fc5e7a | Using Async Generator
DEBUG  | test-1b8405d8-3e00-438e-b3cd-4bae73fc5e7a | Async Generator output: {'error': {'object': 'error', 'message': '', 'type': 'BadRequestError', 'param': None, 'code': 400}}
INFO   | test-1b8405d8-3e00-438e-b3cd-4bae73fc5e7a | Finished running generator.

runpod-workers / worker-vllm

BadRequestError on runsync route, or what is the correct method to hit handler.py's locally run API? #65