tom-doerr commented 1 month ago

The code im running:

lm = dspy.HFClientVLLM(model="NurtureAI/Meta-Llama-3-8B-Instruct-32k", port=38242, url="http://localhost", max_tokens=4)
test_text = "This is a test article. abc"
output_normal = lm(test_text)
print("output_normal:", output_normal)

I start the server with:

python3 -m vllm.entrypoints.openai.api_server --model NurtureAI/Meta-Llama-3-8B-Instruct-32k --tensor-parallel-size 1 --port 38242 --gpu-memory-utilization 0.8

Output:

Failed to parse JSON response: {"detail":"Not Found"}
Traceback (most recent call last):
  File "/home/tom/dspy/dsp/modules/hf_client.py", line 206, in _generate
    completions = json_response["choices"]
KeyError: 'choices'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tom/web_ai/./vllm_logprob_test.py", line 7, in <module>
    output_normal = lm(test_text)
  File "/home/tom/dspy/dsp/modules/hf.py", line 190, in __call__
    response = self.request(prompt, **kwargs)
  File "/home/tom/dspy/dsp/modules/lm.py", line 26, in request
    return self.basic_request(prompt, **kwargs)
  File "/home/tom/dspy/dsp/modules/hf.py", line 147, in basic_request
    response = self._generate(prompt, **kwargs)
  File "/home/tom/dspy/dsp/modules/hf_client.py", line 215, in _generate
    raise Exception("Received invalid JSON response from server")
Exception: Received invalid JSON response from server

tom-doerr commented 1 month ago

Doesn't happen when I switch the model to astronomer/Llama-3-8B-Instruct-GPTQ-8-Bit

python3 -m vllm.entrypoints.openai.api_server --model astronomer/Llama-3-8B-Instruct-GPTQ-8-Bit --quantization gptq --tensor-parallel-size 1 --port 38242 --gpu-memory-utilization 0.8 --dtype float16

tom-doerr commented 1 month ago

Now I'm getting a BadRequestError again. Maybe the vllm server just blocked me because I was senden that many bad request errors earlier.

Creating basic bootstrap: 1/9
  2%|▊                                   | 3/128 [00:00<00:04, 25.35it/s]
Creating basic bootstrap: 2/9
  2%|▊                                   | 3/128 [00:00<00:01, 92.33it/s]
Creating basic bootstrap: 3/9
  2%|▊                                   | 3/128 [00:00<00:01, 76.30it/s]
Creating basic bootstrap: 4/9
  2%|▊                                   | 3/128 [00:00<00:01, 94.34it/s]
Creating basic bootstrap: 5/9
  2%|▊                                   | 3/128 [00:00<00:01, 90.12it/s]
Creating basic bootstrap: 6/9
  2%|▊                                  | 3/128 [00:00<00:01, 102.07it/s]
Creating basic bootstrap: 7/9
  2%|▊                                   | 3/128 [00:00<00:01, 76.79it/s]
Creating basic bootstrap: 8/9
  2%|▊                                   | 3/128 [00:00<00:01, 77.57it/s]
Creating basic bootstrap: 9/9
  2%|▊                                   | 3/128 [00:00<00:01, 80.30it/s]
Failed to parse JSON response: {"object":"error","message":"[{'type': 'extra_forbidden', 'loc': ('body', 'do_sample'), 'msg': 'Extra inputs are not permitted', 'input': True, 'url': 'https://errors.pydantic.dev/2.5/v/extra_forbidden'}]","type":"BadRequestError","param":null,"code":400}
Traceback (most recent call last):
  File "/home/tom/dspy/dsp/modules/hf_client.py", line 206, in _generate
    completions = json_response["choices"]
KeyError: 'choices'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tom/web_ai/./generate_emails.py", line 175, in <module>
    compiled_with_assertions_mailer = teleprompter.compile(emailer, trainset=trainset,  num_trials=100, max_bootstrapped_demos=3, max_labeled_demos=5, eval_kwargs=kwargs, requires_permission_to_run=False)
  File "/home/tom/dspy/dspy/teleprompt/mipro_optimizer.py", line 461, in compile
    instruction_candidates, _ = self._generate_first_N_candidates(
  File "/home/tom/dspy/dspy/teleprompt/mipro_optimizer.py", line 249, in _generate_first_N_candidates
    self.observations = self._observe_data(devset).replace("Observations:", "").replace("Summary:", "")
  File "/home/tom/dspy/dspy/teleprompt/mipro_optimizer.py", line 177, in _observe_data
    observation = dspy.Predict(DatasetDescriptor, n=1, temperature=1.0)(examples=(trainset[0:upper_lim].__repr__()))
  File "/home/tom/dspy/dspy/predict/predict.py", line 61, in __call__
    return self.forward(**kwargs)
  File "/home/tom/dspy/dspy/predict/predict.py", line 103, in forward
    x, C = dsp.generate(template, **config)(x, stage=self.stage)
  File "/home/tom/dspy/dsp/primitives/predict.py", line 112, in do_generate
    completions: list[dict[str, Any]] = generator(prompt, **kwargs)
  File "/home/tom/dspy/dsp/modules/hf.py", line 190, in __call__
    response = self.request(prompt, **kwargs)
  File "/home/tom/dspy/dsp/modules/lm.py", line 26, in request
    return self.basic_request(prompt, **kwargs)
  File "/home/tom/dspy/dsp/modules/hf.py", line 147, in basic_request
    response = self._generate(prompt, **kwargs)
  File "/home/tom/dspy/dsp/modules/hf_client.py", line 215, in _generate
    raise Exception("Received invalid JSON response from server")
Exception: Received invalid JSON response from server

tom-doerr commented 1 month ago

Getting the same "Not found" error again, but only after the MIPRO bootstraping phase.

mikeedjones commented 1 month ago

tom-doerr commented 1 month ago

1043 , but yes, they do look related

didoll-john commented 3 weeks ago

same issue, a little different. I use the Qwen2-72B-Instruct-GPTQ-Int4 model. I followed the gsm8k demo, get the same error. completions = json_response["choices"] KeyError: 'choices' but when I change to do the dspy.Predict('sentence -> sentiment') demo, it works fine.

stanfordnlp / dspy

Failed to parse JSON response: {"detail":"Not Found"} #1041

1002 should be solved by https://github.com/stanfordnlp/dspy/pull/1043 , but yes, they do look related