issue with parsing model from json when using multiple / in the path

samos123 commented 1 month ago

KubeAI logs:

2024/09/29 05:37:45 url: /v1/completions      
2024/09/29 05:37:45 sending error response: 400: unable to parse model: unmarshal json: unexpected end of JSON input

Request json:

{
    "model": "meta-llama-3.2-11b-vision-instruct",
    "prompt": "Yes, I have access to the contents of the original book. Here is chapter 4, as requested:\n\nCHAPTER FOUR\nThe Spectrum of Pain: Four Patients\n\nFrom Jeremy\n\nYou\u2019ve read the story of my back pain. Maybe it sounded familiar (I hope not; it was a bear). In any event, read the next four case histories, because one of them is likely to remind you a lot of yourself. The idea is for you to see where you fit on the spectrum of back pain problems.\n\nLaura\n\nLaura, an active fifty-five-year-old mother of three, first came to me with a two-year history of low-back pain. She had taken every over-the-counter medication and undergone several types of physical therapy, none of which helped. Her pain was severe, constant, and centered in her lower back. In addition, she had pain radiating down the back of her left leg, which was quite debilitating. She had always been active, and her low-back pain was seriously interfering with her lifestyle. It was difficult for her to play with her children, go for walks, or even sit through a movie. She had tried to lose weight to ease the strain on her back but with little success. She had seen another physician, who had recommended surgery, but she was looking for a second opinion.\n\nMy first task was to find out what was causing her pain. In Laura\u2019s case, the pain was coming from a herniated disk. A disk is a cushion that sits between two vertebrae in the spine, providing support and flexibility. A herniated disk occurs when part of the disk slips out of place and presses on a nerve, causing pain. I sent Laura for a magnetic resonance imaging (MRI) scan, which showed the herniated disk clearly.\n\nLaura was a perfect candidate for a type of surgery called a microdiscectomy, in which the disk material that was pressing on the nerve was removed. The surgery is minimally invasive and usually takes about an hour. Patients can often go home the same day.\n\nLaura had the surgery, and within a week, she was able to sit through a movie without pain. Within two weeks, she was walking around the block with her children. Within six weeks, she was playing tennis. She was thrilled. The relief from her pain was immediate and permanent. The surgery had restored her active lifestyle.\n\nTed\n\nTed, a sixty-five-year-old retired accountant, came to me with a three-month history of back pain that was radiating down his right leg. He had not experienced any injury, and the pain had come on gradually. He described the pain as a \u201cdeep ache\u201d that was present most of the time. He had been taking ibuprofen, which had helped a little, but not enough. He was not able to play golf, which he did regularly, because the pain was too intense. He was worried that he would have to give up golf permanently.\n\nWhen I examined Ted, I found that he had a condition called spinal stenosis. Spinal stenosis occurs when the spinal canal, the channel that runs through the center of the spinal column, becomes narrow. This can happen when the ligaments in the spine thicken or when the bones and disks in the spine grow abnormally. The narrowing can put pressure on the spinal cord and the nerves that exit from the spinal cord, causing pain and discomfort.\n\nTed\u2019s condition was a little more difficult to treat than Laura\u2019s. Surgery is usually not recommended for spinal stenosis until other methods have been tried. The first step is usually physical therapy, which involves stretching and strengthening exercises to increase flexibility and alleviate pain. I also prescribed an anti-inflammatory medication that was stronger than the ibuprofen Ted had been taking.\n\nTed underwent six weeks of physical therapy,",
    "temperature": 0,
    "best_of": 1,
    "max_tokens": 199,
    "logprobs": null,
    "stream": true
}

Note I'm encountering this while running vllm benchmark suite:

python3 benchmark_serving.py --backend openai \
    --base-url http://localhost:8000/openai/ \
    --dataset-name=sharegpt --dataset-path=ShareGPT_V3_unfiltered_cleaned_split.json \
    --model meta-llama-3.2-11b-vision-instruct \
    --seed 12345 --tokenizer neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic

nstogner commented 1 month ago

Can you get the benchmark to log http requests?

samos123 commented 1 month ago

I got the request info and it turned out that we crap out when this is the URL:

DEBUG:aiohttp.client:Starting request <TraceRequestStartParams(method='POST', url=URL('http://localhost:8000/openai//v1/completions'), headers=<CIMultiDict('Authorization': 'Bearer None')>)>

Notice the extra slash after openai

I will see if I can reproduce in a test case.

samos123 commented 1 month ago

I do think we need to fix this btw. We should simply strip an extra / in the request url so others don't hit similar issues. It's very easy to mess up because some frameworks require setting the extra / at the end

samos123 commented 1 month ago

I can reproduce with a simple curl command as well:

 curl -v http://localhost:8000/openai//v1/completions \                                                130 ↵
  -H "Content-Type: application/json" \
  -d '{"model": "qwen2-500m-cpu", "prompt": "Who was the first president of the United States?", "max_tokens": 40}'
* Host localhost:8000 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8000...
* Immediate connect fail for ::1: Network is unreachable
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000
> POST /openai//v1/completions HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.9.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 108
>
* upload completely sent off: 108 bytes
< HTTP/1.1 301 Moved Permanently
< Location: /openai/v1/completions
< Date: Sun, 29 Sep 2024 21:52:13 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact

nstogner commented 1 month ago

The error from above indicates a 400 but the curl is mentioning a 301

samos123 commented 1 month ago

-L with curl doesn't work either and returns the 400 error. I will rewrite the integration test to use -L as well. Good catch all!

Full output:

curl -v -L http://localhost:8000/openai//v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen2-500m-cpu", "prompt": "Who was the first president of the United States?", "max_tokens": 40}'

* Host localhost:8000 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8000...
* Immediate connect fail for ::1: Network is unreachable
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000
> POST /openai//v1/completions HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.9.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 108
>
* upload completely sent off: 108 bytes
< HTTP/1.1 301 Moved Permanently
* Need to rewind upload for next request
< Location: /openai/v1/completions
< Date: Mon, 30 Sep 2024 13:49:10 GMT
< Content-Length: 0
* Ignoring the response-body
<
* Connection #0 to host localhost left intact
* Issue another request to this URL: 'http://localhost:8000/openai/v1/completions'
* Switch from POST to GET
* Found bundle for host: 0x61f011443460 [serially]
* Can not multiplex, even if we wanted to
* Re-using existing connection with host localhost
> GET /openai/v1/completions HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.9.1
> Accept: */*
> Content-Type: application/json
>
* Request completely sent off
< HTTP/1.1 400 Bad Request
< X-Proxy: lingo
< Date: Mon, 30 Sep 2024 13:49:10 GMT
< Content-Length: 80
< Content-Type: text/plain; charset=utf-8
<
{"error":"unable to parse model: unmarshal json: unexpected end of JSON input"}

samos123 commented 1 month ago

The root cause seems to be a POST gets auto redirected to GET when using 301: https://datatracker.ietf.org/doc/html/rfc7231#section-6.4.2

We should use a HTTP 307 or 308 to keep it as a post request.

References:

alpe commented 1 month ago

I think the redirect is done in the http.ServeMux I see only 301 redirects there. I wonder if the benchmark suite can be fixed instead?

samos123 commented 4 weeks ago

It's an issue with curl as well. The weird thing is that my integration test doesn't reproduce it, but I can very much reproduce the 400 error in my local env using curl.

All clients behave this way when receiving 301 on POST request it seems:

* Issue another request to this URL: 'http://localhost:8000/openai/v1/completions'
* Switch from POST to GET

samos123 commented 4 weeks ago

I was able to reproduce in automated testing as well: https://github.com/substratusai/kubeai/actions/runs/11127386385/job/30919572183?pr=259#step:6:593

substratusai / kubeai

issue with parsing model from json when using multiple / in the path #257