stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy.ai
MIT License
18.79k stars 1.44k forks source link

Support for setting OpenAI project field in addition to api_key? (I would be happy to send PR; not requesting work from team) #1514

Open willy-b opened 1 month ago

willy-b commented 1 month ago

Hello, Thanks for a great project.

In https://platform.openai.com/settings/ , OpenAI allows the configuration (API keys, rate limits, usage, and allowed models, etc) to be grouped by Projects, which have IDs. Screenshot_20240919_132005

The current DSPy OpenAI adapter dsp/modules/gpt3.py (named gpt3.py but supports many OpenAI models like gpt4, and even o-1 as of https://github.com/stanfordnlp/dspy/commit/928068e866a9d13c42cefa851871841c42e6b5fb ) obviously allows passing one's API key when instantiating it: https://github.com/stanfordnlp/dspy/blob/2cc029b03e18fe3bc05b5131ef3e02390af895b3/dsp/modules/gpt3.py#L59

However, DSPy's OpenAI adapter does not currently seem to allow specifying a particular project. One can use project specific API keys to work around this pretty easily but it also just a couple of lines code change to allow picking up a project argument from the kwargs, something like:

if "project" in self.kwargs:
            openai.project = self.kwargs["project"]
            del self.kwargs["project"]

added right AFTER self.kwargs is defined in __init__ at https://github.com/stanfordnlp/dspy/blob/2cc029b03e18fe3bc05b5131ef3e02390af895b3/dsp/modules/gpt3.py#L98 will do the trick (tested and verified by me). add_project_support_manually

Any interest in a quick PR from me (or your team if preferred) to allow users who have a single multi-project API key to specify which project their DSPy activity in that particular code execution should be tracked against?

Thanks again for an awesome project!

(Credit to Stanford XCS224U for introducing me to DSPy)

okhat commented 1 month ago

Hey thanks for this! I haven't read in detail yet but I think this is probably a lot easier now thanks to dspy.LM via LiteLLM? Check it out and lmk!

willy-b commented 1 month ago

tl;dr: I think if one prefers a workaround without changing code one can use the environment variable OPENAI_PROJECT_ID so long as OpenAI's official python library is used underneath to get the project id set. But then the API key and the project may be getting set different places and easier to miss it. And then if implementation underneath moves away from the official library (especially if one uses dspy->LiteLLM->OpenAI; liteLLM could change their implementation) suddenly that field drops off. Also, you need to make sure that this environment variable is set when openai's client is created in python, so it may be more brittle than allowing it to be overridden wherever you currently allow the key to be set.

Checking out whether now that your recent change in https://github.com/stanfordnlp/dspy/pull/1486 has landed,if one replaces the earlier usage of dspy.OpenAI() with LiteLLM based LM if one can pass the project as a keyword argument: (tl;dr - it does not allow passing this any differently than dspy.OpenAI AFAICT)

model='gpt-3.5-turbo' # just an example
lm = dspy.OpenAI(model=model, api_key=openai_key, 
# project="proj_xxx" # not supported argument here (note, this would be project ID not project key, to be set on openai client object to set the OpenAI-Project header per https://platform.openai.com/docs/api-reference/organizations-and-projects-optional
)
dspy.settings.configure(lm=lm)

with the week old LiteLLM adapter path to OpenAI, if one can pass the project that way:

model='gpt-3.5-turbo' # just an example
lm = dspy.LM(model=model, api_key=openai_key, project="proj_xxx") # uses LiteLLM all purpose adapter
dspy.settings.configure(lm=lm)

Let us try using that lm:

lm = dspy.LM(model=model, api_key=openai_key, project="proj_xxx") # uses LiteLLM all purpose adapter
lm("Can LiteLLM pass a project id to OpenAI APIs along with the api key?")
# we get error, see below (does not set it on openai client to set header via e.g. https://github.com/openai/openai-python/blob/6172976b16821b24194a05e3e3fe5cb2342a2b4b/src/openai/_client.py#L169, tries to pass it in request instead)

It looks like they construct the OpenAI client without passing project (only organization):

https://github.com/BerriAI/litellm/blob/cd9080780714b4fc7b0d8bfee8fde80926164822/litellm/llms/OpenAI/openai.py

openai_client: OpenAI = self._get_openai_client(  # type: ignore
                            is_async=False,
                            api_key=api_key,
                            api_base=api_base,
                            timeout=timeout,
                            max_retries=max_retries,
                            organization=organization,
                            client=client,
                        )

So they pass the kwargs project in the body instead of mapping it to the OpenAI-Project header at https://github.com/openai/openai-python/blob/6172976b16821b24194a05e3e3fe5cb2342a2b4b/src/openai/_client.py#L169 .

And we get an Exception for OpenAIError: Error code: 400 - {'error': {'message': 'Unrecognized request argument supplied: project', 'type': 'invalid_request_error', 'param': None, 'code': None}} (why I delete from kwargs after adding to openai.project in the proposed code change is avoid this from happening)

20 frames
[/usr/local/lib/python3.10/dist-packages/litellm/llms/OpenAI/openai.py](https://localhost:8080/#) in completion(self, model_response, timeout, optional_params, logging_obj, model, messages, print_verbose, api_key, api_base, acompletion, litellm_params, logger_fn, headers, custom_prompt_dict, client, organization, custom_llm_provider, drop_params) 
824 headers, response = (-->
825 self.make_sync_openai_chat_completion_request(
826 openai_client=openai_client,
[/usr/local/lib/python3.10/dist-packages/litellm/llms/OpenAI/openai.py](https://localhost:8080/#) in make_sync_openai_chat_completion_request(self, openai_client, data, timeout) 
682 except Exception as e:--> 
683 raise e 
684 
[/usr/local/lib/python3.10/dist-packages/litellm/llms/OpenAI/openai.py](https://localhost:8080/#) in make_sync_openai_chat_completion_request(self, openai_client, data, timeout) 
671 try:--> 
672 raw_response = openai_client.chat.completions.with_raw_response.create( 
673 **data, timeout=timeout
[/usr/local/lib/python3.10/dist-packages/openai/_legacy_response.py](https://localhost:8080/#) in wrapped(*args, **kwargs) 
352 --> 
353 return cast(LegacyAPIResponse[R], func(*args, **kwargs)) 
354
...
BadRequestError: litellm.BadRequestError: OpenAIException - Error code: 400 - {'error': {'message': 'Unrecognized request argument supplied: project', 'type': 'invalid_request_error', 'param': None, 'code': None}}

(should be special cased set on openai client object to get mapped to OpenAI-Project header instead)

Note that supporting the project field may eventually be unnecessary as they are moving everybody on the OpenAI side to Project specific API keys, to help people avoid forgetting the field. OpenAI has recently marked User API keys "legacy" and encouraged project specific API keys (not to be confused with the project id itself): Screenshot_20240919_160749

But it is still possible to use User API keys where a user can be part of multiple projects (and create new ones), so this is still an issue. See also their notes on authenticating to projects (via OpenAI-Project header): https://platform.openai.com/docs/api-reference/organizations-and-projects-optional Screenshot_20240919_164545

Thanks!

arnavsinghvi11 commented 1 month ago

hi @willy-b , litellm indeed doesn't support the project parameter currently either, but maybe we can get by this by setting the openai configurable parameters before the LM call, rather than within the request?

import openai
openai.api_key = '...'
openai.project = "..."

lm = dspy.LM(model=model, ...)

a bit of a patchy solution, but it'll probably be easier than adding any new parameters for future OpenAI API changes. lmk if that helps!