stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
17.4k stars 1.33k forks source link

How to use dspy.OpenAI() like langchain.llms.OpenAI() #479

Closed ujjawal-ti closed 1 week ago

ujjawal-ti commented 7 months ago

Hey, I'm trying to use my LLM on vLLM server which is exposed as an API. Usually, I create an openai LLM instance with Langchain like below, and it works fine.

import openai
from langchain.llms import OpenAI

openai.api_base = "https://our_url/api"  # use the IP or hostname of your instance
openai.api_key = "random_value"  

llm_mistral = OpenAI(api_base = openai.api_base, 
             api_key='random_value',
             model='Mistral-7B-Instruct-v0.1',
             )

I want to use dspy.OpenAI module in similar way,

import openai
openai.api_base = "https://our_url/api"  # use the IP or hostname of your instance
openai.api_key = "random_value"  

llm_mistral = dspy.OpenAI(api_base = openai.api_base, 
             api_key='random_value',
             model='Mistral-7B-Instruct-v0.1',
             )

# This sets the language model for DSPy.
dspy.settings.configure(lm=llm_mistral)

# This is not required but it helps to understand what is happening
my_example = {
    "question": "What game was Super Mario Bros. 2 based on?",
    "answer": "Doki Doki Panic",
}

# This is the signature for the predictor. It is a simple question and answer model.
class BasicQA(dspy.Signature):
    """Answer questions about classic video games."""

    question = dspy.InputField(desc="a question about classic video games")
    answer = dspy.OutputField(desc="often between 1 and 5 words")

# Define the predictor.
generate_answer = dspy.Predict(BasicQA)

# Call the predictor on a particular input.
pred = generate_answer(question=my_example['question'])

# Print the answer...profit :)
print(pred.answer)

Getting the following error,

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 4.2 seconds after 4 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 0.6 seconds after 5 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 3.3 seconds after 6 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 14.5 seconds after 7 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 23.4 seconds after 8 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 141.3 seconds after 9 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 343.6 seconds after 10 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
Backing off 466.7 seconds after 11 tries calling function <function GPT3.request at 0x7f06e97f6170> with kwargs {}
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
Cell In[3], line 30
     27 generate_answer = dspy.Predict(BasicQA)
     29 # Call the predictor on a particular input.
---> 30 pred = generate_answer(question=my_example['question'])
     32 # Print the answer...profit :)
     33 print(pred.answer)

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dspy/predict/predict.py:49, in Predict.__call__(self, **kwargs)
     48 def __call__(self, **kwargs):
---> 49     return self.forward(**kwargs)

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dspy/predict/predict.py:90, in Predict.forward(self, **kwargs)
     87 template = signature_to_template(signature)
     89 if self.lm is None:
---> 90     x, C = dsp.generate(template, **config)(x, stage=self.stage)
     91 else:
     92     # Note: query_only=True means the instructions and examples are not included.
     93     # I'm not really sure why we'd want to do that, but it's there.
     94     with dsp.settings.context(lm=self.lm, query_only=True):

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dsp/primitives/predict.py:78, in _generate.<locals>.do_generate(example, stage, max_depth, original_example)
     76 # Generate and extract the fields.
     77 prompt = template(example)
---> 78 completions: list[dict[str, Any]] = generator(prompt, **kwargs)
     79 completions: list[Example] = [template.extract(example, p) for p in completions]
     81 # Find the completions that are most complete.

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dsp/modules/gpt3.py:190, in GPT3.__call__(self, prompt, only_completed, return_sorted, **kwargs)
    182 assert return_sorted is False, "for now"
    184 # if kwargs.get("n", 1) > 1:
    185 #     if self.model_type == "chat":
    186 #         kwargs = {**kwargs}
    187 #     else:
    188 #         kwargs = {**kwargs, "logprobs": 5}
--> 190 response = self.request(prompt, **kwargs)
    192 if dsp.settings.log_openai_usage:
    193     self.log_usage(response)

File ~/.local/lib/python3.10/site-packages/backoff/_sync.py:105, in retry_exception.<locals>.retry(*args, **kwargs)
     96 details = {
     97     "target": target,
     98     "args": args,
   (...)
    101     "elapsed": elapsed,
    102 }
    104 try:
--> 105     ret = target(*args, **kwargs)
    106 except exception as e:
    107     max_tries_exceeded = (tries == max_tries_value)

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dsp/modules/gpt3.py:156, in GPT3.request(self, prompt, **kwargs)
    153 if "model_type" in kwargs:
    154     del kwargs["model_type"]
--> 156 return self.basic_request(prompt, **kwargs)

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dsp/modules/gpt3.py:133, in GPT3.basic_request(self, prompt, **kwargs)
    131 else:
    132     kwargs["prompt"] = prompt
--> 133     response = completions_request(**kwargs)
    135 history = {
    136     "prompt": prompt,
    137     "response": response,
    138     "kwargs": kwargs,
    139     "raw_kwargs": raw_kwargs,
    140 }
    141 self.history.append(history)

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dsp/modules/gpt3.py:278, in completions_request(**kwargs)
    275 if OPENAI_LEGACY:
    276     return cached_gpt3_request_v2_wrapped(**kwargs)
--> 278 return v1_cached_gpt3_request_v2_wrapped(**kwargs).model_dump()

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dsp/modules/cache_utils.py:17, in noop_decorator.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
     15 @wraps(func)
     16 def wrapper(*args, **kwargs):
---> 17     return func(*args, **kwargs)

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dsp/modules/gpt3.py:253, in v1_cached_gpt3_request_v2_wrapped(**kwargs)
    250 @functools.lru_cache(maxsize=None if cache_turn_on else 0)
    251 @NotebookCacheMemory.cache
    252 def v1_cached_gpt3_request_v2_wrapped(**kwargs):
--> 253     return v1_cached_gpt3_request_v2(**kwargs)

File ~/.local/lib/python3.10/site-packages/joblib/memory.py:655, in MemorizedFunc.__call__(self, *args, **kwargs)
    654 def __call__(self, *args, **kwargs):
--> 655     return self._cached_call(args, kwargs)[0]

File ~/.local/lib/python3.10/site-packages/joblib/memory.py:598, in MemorizedFunc._cached_call(self, args, kwargs, shelving)
    595     must_call = True
    597 if must_call:
--> 598     out, metadata = self.call(*args, **kwargs)
    599     if self.mmap_mode is not None:
    600         # Memmap the output at the first call to be consistent with
    601         # later calls
    602         if self._verbose:

File ~/.local/lib/python3.10/site-packages/joblib/memory.py:856, in MemorizedFunc.call(self, *args, **kwargs)
    854 if self._verbose > 0:
    855     print(format_call(self.func, args, kwargs))
--> 856 output = self.func(*args, **kwargs)
    857 self.store_backend.dump_item(
    858     [func_id, args_id], output, verbose=self._verbose)
    860 duration = time.time() - start_time

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/dsp/modules/gpt3.py:248, in v1_cached_gpt3_request_v2(**kwargs)
    246 @CacheMemory.cache
    247 def v1_cached_gpt3_request_v2(**kwargs):
--> 248     return openai.completions.create(**kwargs)

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/openai/_utils/_utils.py:301, in required_args.<locals>.inner.<locals>.wrapper(*args, **kwargs)
    299             msg = f"Missing required argument: {quote(missing[0])}"
    300     raise TypeError(msg)
--> 301 return func(*args, **kwargs)

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/openai/resources/completions.py:559, in Completions.create(self, model, prompt, best_of, echo, frequency_penalty, logit_bias, logprobs, max_tokens, n, presence_penalty, seed, stop, stream, suffix, temperature, top_p, user, extra_headers, extra_query, extra_body, timeout)
    517 @required_args(["model", "prompt"], ["model", "prompt", "stream"])
    518 def create(
    519     self,
   (...)
    557     timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
    558 ) -> Completion | Stream[Completion]:
--> 559     return self._post(
    560         "/completions",
    561         body=maybe_transform(
    562             {
    563                 "model": model,
    564                 "prompt": prompt,
    565                 "best_of": best_of,
    566                 "echo": echo,
    567                 "frequency_penalty": frequency_penalty,
    568                 "logit_bias": logit_bias,
    569                 "logprobs": logprobs,
    570                 "max_tokens": max_tokens,
    571                 "n": n,
    572                 "presence_penalty": presence_penalty,
    573                 "seed": seed,
    574                 "stop": stop,
    575                 "stream": stream,
    576                 "suffix": suffix,
    577                 "temperature": temperature,
    578                 "top_p": top_p,
    579                 "user": user,
    580             },
    581             completion_create_params.CompletionCreateParams,
    582         ),
    583         options=make_request_options(
    584             extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout
    585         ),
    586         cast_to=Completion,
    587         stream=stream or False,
    588         stream_cls=Stream[Completion],
    589     )

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/openai/_base_client.py:1063, in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
   1049 def post(
   1050     self,
   1051     path: str,
   (...)
   1058     stream_cls: type[_StreamT] | None = None,
   1059 ) -> ResponseT | _StreamT:
   1060     opts = FinalRequestOptions.construct(
   1061         method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1062     )
-> 1063     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/openai/_base_client.py:842, in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
    833 def request(
    834     self,
    835     cast_to: Type[ResponseT],
   (...)
    840     stream_cls: type[_StreamT] | None = None,
    841 ) -> ResponseT | _StreamT:
--> 842     return self._request(
    843         cast_to=cast_to,
    844         options=options,
    845         stream=stream,
    846         stream_cls=stream_cls,
    847         remaining_retries=remaining_retries,
    848     )

File /home/shared/.miniconda3/envs/py10cuda117/lib/python3.10/site-packages/openai/_base_client.py:885, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
    882     # If the response is streamed then we need to explicitly read the response
    883     # to completion before attempting to access the response text.
    884     err.response.read()
--> 885     raise self._make_status_error_from_response(err.response) from None
    886 except httpx.TimeoutException as err:
    887     if retries > 0:

NotFoundError: Error code: 404 - {'detail': 'Not Found'}

Any idea how to use dspy.OpenAI like above ?

koshyviv commented 7 months ago

Shooting in the open here:

  1. Does in work when you hit it via cURL?
  2. Can you try with stream disabled?
ujjawal-ti commented 7 months ago

Hey @koshyviv ,

  1. I've not tried cURL, but it works fine as mentioned above with OpenAI sdk.
  2. For this API, we've disabled the stream feature, there is another API endpoint were we're using streaming.
adrianlyjak commented 6 months ago

It's helpful to add debug logging so you can see where the requests are going. You probably also want to set the model to chat, which will append /chat/completions to your base uri, and use a chat format payload. The default is text, which will make a request to /completions with a prompt in the json body.

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
root.addHandler(handler)

llm_mistral = OpenAI(api_base = openai.api_base, 
             api_key='random_value',
             model='Mistral-7B-Instruct-v0.1',
             model_type="chat"
             )

I'm having some other issues that when optimizing, the model seems to revert to a text model 😭

kuzcotopiallm commented 6 months ago

My issue, using an openai compatible api server was the api_base missing a trailing / . Looking at azure_openai_usage.log made it more obvious:

HTTP Request: POST http://127.0.0.1:5000/v1chat/completions "HTTP/1.1 404 Not Found"

Something like this:

openai.api_base = "https://our_url/api/"  # use the IP or hostname of your instance
MindsightsAI commented 5 months ago

@ujjawal-ti which api do you use for streaming ?

mutong184 commented 4 months ago

Do you solve this problem ? I have the same error.

SUSTYuxiao commented 3 months ago

我遇到了类似的问题 猜测原因一方面可能是url后缀问题,一方面是openai的版本过旧 通过执行以下命令我成功获取到了结果

首先,更新openai和dspy到最新版本 pip install --upgrade dspy openai

然后通过以下demo获取结果:


import dspy
import openai

model="gpt-3.5-turbo"
api_base = f'${host}/v1/'
api_key = f"{key}"

turbo = dspy.OpenAI(model=model, max_tokens=250, api_base= api_base, api_key = api_key,model_type="chat")
dspy.settings.configure(lm=turbo)

sentence = "it's a charming and often affecting journey."  # example from the SST-2 dataset.

classify = dspy.Predict('sentence -> sentiment')
classify(sentence=sentence).sentiment

最终得到输出: "Sentence: it's a charming and often affecting journey.\nSentiment: Positive"