langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.4k stars 14.78k forks source link

LangChain classes share openai global values #4775

Closed zioproto closed 1 year ago

zioproto commented 1 year ago

System Info

langchain==0.0.169

Who can help?

@hwchase17 @ekzh

Information

Related Components

Reproduction

import os
import langchain
import openai
from langchain.llms import AzureOpenAI
from langchain.chat_models import AzureChatOpenAI
from langchain.embeddings import OpenAIEmbeddings

llmconfig = {
    "openai_api_key": "<secret>",
    "openai_api_base": "https://myllm.openai.azure.com/",
    "deployment_name": "davinci",
}

chatconfig = {
    "model_name": "gpt-35-turbo",
    "openai_api_type": "azure",
    "openai_api_version": "chatVERSION",
    "openai_api_key": "<secret>",
    "openai_api_base": "https://mychat.openai.azure.com/",
    "deployment_name": "gpt-35-turbo",
}

embedderconfig = {
    "openai_api_key": "<secret>",
    "model": "ada",
    "openai_api_base": "https://myembedder.openai.azure.com/",
    "openai_api_version": "embedderVERSION",
    "deployment": "ada",
}

# First time
llm = AzureOpenAI(**llmconfig)
print(openai.api_version)
chat = AzureChatOpenAI(**chatconfig)
print(openai.api_version)
embedder = OpenAIEmbeddings(**embedderconfig)
print(openai.api_version)
print("\n")
# Second time
llm = AzureOpenAI(**llmconfig)
print(openai.api_version)
chat = AzureChatOpenAI(**chatconfig)
print(openai.api_version)
embedder = OpenAIEmbeddings(**embedderconfig)
print(openai.api_version)

This code will return the following:

None
chatVERSION
embedderVERSION

embedderVERSION
chatVERSION
embedderVERSION

Expected behavior

The LangChain classes should not alter the global openai module values, because this could cause conflicts when multiple classes are using those.

For example if using Chat/Completion API and Embeddings API use a different api_version value. Or when using Chat/Completion from Azure and Embeddings from OpenAI, because the classes share the same openai global values, depending on the order of operations there will be unexpected behaviours.

Related issues:

2683

4352

Related PR: https://github.com/hwchase17/langchain/pull/4234 https://github.com/pieroit/cheshire-cat/pull/195

Related code: https://github.com/hwchase17/langchain/blob/a7af32c274860ee9174830804301491973aaee0a/langchain/chat_models/azure_openai.py#L79-L87

and

https://github.com/hwchase17/langchain/blob/a7af32c274860ee9174830804301491973aaee0a/langchain/embeddings/openai.py#L166-L178

zioproto commented 1 year ago

Related issue: https://github.com/openai/openai-python/issues/411

ekzhu commented 1 year ago

I see. I think it is possible to set all the parameters as keyword arguments in the openai.Completion.create and openai.ChatCompletion.create methods.

ruckc commented 1 year ago

Ran into this also... where somehow OpenAIEmbeddings is getting all the variables passed in, but it isn't passing them into the openai.Embeddings instance

Pilosite commented 1 year ago

I would like to help on this one cause its blocking me from going further, my ada and gpt models are in two different Azure region. @ekzhu could you put me on the right path to implement a workaround ?

Pilosite commented 1 year ago

From what I understand in embeddings.py we have the following import that we also have in chat_models/azure_openai.py: try:

            import openai

            openai.api_key = openai_api_key
            if openai_organization:
                openai.organization = openai_organization
            if openai_api_base:
                openai.api_base = openai_api_base
            if openai_api_type:
                openai.api_version = openai_api_version
            if openai_api_type:
                openai.api_type = openai_api_type
            values["client"] = openai.Embedding
        except ImportError:
            raise ValueError(
                "Could not import openai python package. "
                "Please install it with `pip install openai`."
            )
        return values

We have 2 different calls to the openai modules that load its settings from os.environ (in openai/__init___.py) :

api_key = os.environ.get("OPENAI_API_KEY")
# Path of a file with an API key, whose contents can change. Supercedes
# `api_key` if set.  The main use case is volume-mounted Kubernetes secrets,
# which are updated automatically.
api_key_path: Optional[str] = os.environ.get("OPENAI_API_KEY_PATH")

organization = os.environ.get("OPENAI_ORGANIZATION")
api_base = os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1")
api_type = os.environ.get("OPENAI_API_TYPE", "open_ai")
api_version = (
    "2023-03-15-preview" if api_type in ("azure", "azure_ad", "azuread") else None
)

I tried passing the base/key or creating the llm and embedding objects "on the fly" in the ConversationalRetrievalChain.from_llm() function without luck:

Is the issue related to the fact that we have singletons in env settings once the module is loaded ? (not a Python expert)

What would be the best way to handle 2 different openai context ? I looked at the openai forum but I didn't find similar context (with dual base/keys)

I looked at completion/create methods but didn't find a proper way to handle this in a single script.

zioproto commented 1 year ago

@Pilosite : Can you please fix the Markdown formatting of the above comment ? Thanks

zioproto commented 1 year ago

@Pilosite I am not a python expert, but if I understand correctly what @ekzhu is suggesting:

Instead of doing this: https://github.com/hwchase17/langchain/blob/a7af32c274860ee9174830804301491973aaee0a/langchain/chat_models/azure_openai.py#L94

You need to do:

values["client"] = openai.ChatCompletion.create(<parameters>)

Look at this:

In [27]: import openai

In [32]: values = {}

In [33]: values["client"] = openai.ChatCompletion

In [34]: type(values["client"])
Out[34]: type

In [35]: values["client"]
Out[35]: openai.api_resources.chat_completion.ChatCompletion

Instead of that, you want to create an actual object (that will not be global). Example:

In [36]: messages = [{"role": "system", "content": ""},]

In [37]: from openai import ChatCompletion

In [38]: c = ChatCompletion.create(engine="gpt-35-turbo",api_key="SECRET",api_type="azure",api_version="2023-03-15-preview",api_base="https://dummy.openai.azure.com/",messages=messages)

In [39]: type(c)
Out[39]: openai.openai_object.OpenAIObject

In [40]: c
Out[40]:
<OpenAIObject chat.completion id=chatcmpl-7J7GLapWegBSPW3YXQcSmEj5HykOK at 0x10ccfb0b0> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "I'm sorry, I cannot provide an answer without a specific question. Please provide more details so I can assist you better.",
        "role": "assistant"
      }
    }
  ],
  "created": 1684790505,
  "id": "chatcmpl-7J7GLapWegBSPW3YXQcSmEj5HykOK",
  "model": "gpt-35-turbo",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 25,
    "prompt_tokens": 8,
    "total_tokens": 33
  }
}

It is a bit weird because to create the object you need immediately messages and it will make an API call right away.

It comes from here: https://github.com/openai/openai-python/blob/fe3abd16b582ae784d8a73fd249bcdfebd5752c9/openai/api_resources/chat_completion.py#L8-L30

I dont understand if there is a way to create a <OpenAIObject chat.completion> without firing immediately a POST request to the API.

Pilosite commented 1 year ago

@zioproto thanks a lot for your time, I will check !

ekzhu commented 1 year ago

@Pilosite I am not a python expert, but if I understand correctly what @ekzhu is suggesting:

Instead of doing this:

https://github.com/hwchase17/langchain/blob/a7af32c274860ee9174830804301491973aaee0a/langchain/chat_models/azure_openai.py#L94

You need to do:

values["client"] = openai.ChatCompletion.create(<parameters>)

Look at this:

In [27]: import openai

In [32]: values = {}

In [33]: values["client"] = openai.ChatCompletion

In [34]: type(values["client"])
Out[34]: type

In [35]: values["client"]
Out[35]: openai.api_resources.chat_completion.ChatCompletion

Instead of that, you want to create an actual object (that will not be global). Example:

In [36]: messages = [{"role": "system", "content": ""},]

In [37]: from openai import ChatCompletion

In [38]: c = ChatCompletion.create(engine="gpt-35-turbo",api_key="SECRET",api_type="azure",api_version="2023-03-15-preview",api_base="https://dummy.openai.azure.com/",messages=messages)

In [39]: type(c)
Out[39]: openai.openai_object.OpenAIObject

In [40]: c
Out[40]:
<OpenAIObject chat.completion id=chatcmpl-7J7GLapWegBSPW3YXQcSmEj5HykOK at 0x10ccfb0b0> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "I'm sorry, I cannot provide an answer without a specific question. Please provide more details so I can assist you better.",
        "role": "assistant"
      }
    }
  ],
  "created": 1684790505,
  "id": "chatcmpl-7J7GLapWegBSPW3YXQcSmEj5HykOK",
  "model": "gpt-35-turbo",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 25,
    "prompt_tokens": 8,
    "total_tokens": 33
  }
}

It is a bit weird because to create the object you need immediately messages and it will make an API call right away.

It comes from here: https://github.com/openai/openai-python/blob/fe3abd16b582ae784d8a73fd249bcdfebd5752c9/openai/api_resources/chat_completion.py#L8-L30

I dont understand if there is a way to create a <OpenAIObject chat.completion> without firing immediately a POST request to the API.

This is exactly what I was saying. 👍

kumapo commented 1 year ago

a workaround is here: https://gist.github.com/kumapo/d32e0864ba81d94fb17e7d948f346e46 you can import OpenAIEmbeddings from embeddings and use it with EMBED_OPENAI_API_KEY s in environ.

Pilosite commented 1 year ago

thanks a lot @kumapo ! => can confirm this works as expected, nice !

TSPereira commented 1 year ago

Could we make use of partial on the validation_environment method to define the env keys for the client and then not passing them anymore further down the pipeline? This would make it that you only read the env keys once and set them for that class instance forever, thus allowing to have different instances with different keys/bases/etc

TSPereira commented 1 year ago

Not sure if everyone noticed it but this should now be solved since #5792 🎉

zioproto commented 1 year ago

I confirm I cant reproduce this issue anymore with LangChain 0.0.199