langchain-ai / langchain

πŸ¦œπŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.51k stars 14.81k forks source link

AZure Openai Embeddings #14934

Closed Vivek-Kawathalkar closed 3 weeks ago

Vivek-Kawathalkar commented 9 months ago

System Info

C:\Users\vivek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\langchain\embeddings\azure_openai.py:101: UserWarning: As of openai>=1.0.0, Azure endpoints should be specified via the azure_endpoint param not openai_api_base (or alias base_url). Updating openai_api_base from to /openai.
warnings.warn( C:\Users\vivek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\langchain\embeddings\azure_openai.py:108: UserWarning: As of openai>=1.0.0, if deployment (or alias azure_deployment) is specified then openai_api_base (or alias base_url) should not be. Instead use deployment (or alias azure_deployment) and azure_endpoint.
warnings.warn( C:\Users\vivek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\langchain\embeddings\azure_openai.py:116: UserWarning: As of openai>=1.0.0, if openai_api_base (or alias base_url) is specified it is expected to be of the form https://example-resource.azure.openai.com/openai/deployments/example-deployment. Updating to /openai. warnings.warn( Traceback (most recent call last): File "c:\Users\vivek\OneDrive\Desktop\Hackathon\doc.py", line 28, in embeddings=AzureOpenAIEmbeddings(deployment=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pydantic\main.py", line 341, in pydantic.main.BaseModel.init pydantic.error_wrappers.ValidationError: 1 validation error for AzureOpenAIEmbeddings root base_url and azure_endpoint are mutually exclusive (type=value_error)

Who can help?

No response

Information

Related Components

Reproduction

from langchain.document_loaders import PyPDFLoader from langchain.embeddings.openai import OpenAIEmbeddings from langchain.embeddings import AzureOpenAIEmbeddings from langchain.vectorstores import FAISS from dotenv import load_dotenv import openai import os

load environment variables

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("OPENAI_DEPLOYMENT_ENDPOINT") OPENAI_DEPLOYMENT_NAME = os.getenv("OPENAI_DEPLOYMENT_NAME") OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME") OPENAI_DEPLOYMENT_VERSION = os.getenv("OPENAI_DEPLOYMENT_VERSION")

OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = os.getenv("OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME") OPENAI_ADA_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_ADA_EMBEDDING_MODEL_NAME")

init Azure OpenAI

openai.api_type = "azure" openai.api_version = OPENAI_DEPLOYMENT_VERSION openai.api_base = OPENAI_DEPLOYMENT_ENDPOINT openai.api_key = OPENAI_API_KEY

if name == "main":

embeddings=AzureOpenAIEmbeddings(deployment=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME, model=OPENAI_ADA_EMBEDDING_MODEL_NAME, openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT, openai_api_type="azure", chunk_size=100)

dataPath = "./data/documentation/"

fileName = r'C:\Users\vivek\OneDrive\Desktop\Hackathon\data\FAQ For LTO Hotels.pdf'

use langchain PDF loader

loader = PyPDFLoader(fileName)

split the document into chunks

pages = loader.load_and_split()

Use Langchain to create the embeddings using text-embedding-ada-002

db = FAISS.from_documents(documents=pages, embedding=embeddings)

save the embeddings into FAISS vector store

db.save_local(r"C:\Users\vivek\OneDrive\Desktop\Hackathon\index")

from dotenv import load_dotenv import os import openai from langchain.chains import RetrievalQA from langchain.vectorstores import FAISS from langchain.chains.question_answering import load_qa_chain from langchain.chat_models import AzureChatOpenAI from langchain.embeddings.openai import OpenAIEmbeddings from langchain.vectorstores import FAISS from langchain.chains import ConversationalRetrievalChain from langchain.prompts import PromptTemplate

load environment variables

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("OPENAI_DEPLOYMENT_ENDPOINT") OPENAI_DEPLOYMENT_NAME = os.getenv("OPENAI_DEPLOYMENT_NAME") OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME") OPENAI_DEPLOYMENT_VERSION = os.getenv("OPENAI_DEPLOYMENT_VERSION")

OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = os.getenv("OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME") OPENAI_ADA_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_ADA_EMBEDDING_MODEL_NAME")

def ask_question(qa, question): result = qa({"query": question}) print("Question:", question) print("Answer:", result["result"])

def ask_question_with_context(qa, question, chat_history): query = "what is Azure OpenAI Service?" result = qa({"question": question, "chat_history": chat_history}) print("answer:", result["answer"]) chat_history = [(query, result["answer"])] return chat_history

if name == "main":

Configure OpenAI API

openai.api_type = "azure"
openai.api_base = os.getenv('OPENAI_API_BASE')
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_version = os.getenv('OPENAI_API_VERSION')
llm = AzureChatOpenAI(deployment_name=OPENAI_DEPLOYMENT_NAME,
                  model_name=OPENAI_MODEL_NAME,
                  openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
                  openai_api_version=OPENAI_DEPLOYMENT_VERSION,
                  openai_api_key=OPENAI_API_KEY,
                  openai_api_type="azure")

embeddings=OpenAIEmbeddings(deployment=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME,
                            model=OPENAI_ADA_EMBEDDING_MODEL_NAME,
                            openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
                            openai_api_type="azure",
                            chunk_size=1)

# Initialize gpt-35-turbo and our embedding model
#load the faiss vector store we saved into memory
vectorStore = FAISS.load_local(r"C:\Users\vivek\OneDrive\Desktop\Hackathon\index", embeddings)

#use the faiss vector store we saved to search the local document
retriever = vectorStore.as_retriever(search_type="similarity", search_kwargs={"k":2})

QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:""")

qa = ConversationalRetrievalChain.from_llm(llm=llm,
                                        retriever=retriever,
                                        condense_question_prompt=QUESTION_PROMPT,
                                        return_source_documents=True,
                                        verbose=False)

chat_history = []
while True:
    query = input('you: ')
    if query == 'q':
        break
    chat_history = ask_question_with_context(qa, query, chat_history)

Expected behavior

QA

dosubot[bot] commented 9 months ago

πŸ€–

Hello @Vivek-Kawathalkar! πŸš€ I'm Dosu, a friendly bot ready to assist with your LangChain issues, answer your questions, and guide you on your contribution journey. While we're waiting for a human maintainer to join, I'm here to help. Let's get this issue sorted out! 😎 πŸ› οΈ

Based on the error message you provided, it seems like the issue is related to the usage of openai_api_base and azure_endpoint parameters in the AzureOpenAIEmbeddings class. These parameters are mutually exclusive, meaning you should only use one of them, not both.

In the OpenAI version 1.0.0 and later, the openai_api_base parameter should be replaced with the azure_endpoint parameter. The openai_api_base parameter is expected to be of the form https://example-resource.azure.openai.com/openai/deployments/example-deployment. If deployment is specified, openai_api_base should not be used. Instead, use deployment and azure_endpoint.

Here's how you can modify your code:

embeddings=AzureOpenAIEmbeddings(deployment=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME,
                            model=OPENAI_ADA_EMBEDDING_MODEL_NAME,
                            azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
                            openai_api_type="azure",
                            chunk_size=100)

In this code, I replaced openai_api_base with azure_endpoint. Please try this modification and let me know if it resolves your issue.

For more information, you can refer to the AzureOpenAIEmbeddings class in the LangChain framework.

If you still encounter issues, please provide more details about the error and your environment, and I'll be glad to assist you further.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

muzian666 commented 7 months ago

For your reference as a temporary workaround:

In langchain_openai\embeddings\azure.py line 59, set validate_base_url: bool = False

Then in line 129, remove "base_url": values["openai_api_base"],

This will force program not to validate the URL, but it will temporary work if you make sure you only use AzureOpenAI

rishhavv commented 7 months ago

I can pick this up, if no-one's working on it. A lot of organisations depend on Azure for embeddings and this is creating an issue.

CatsNipYummy commented 7 months ago

Facing the same issue on my end as well. Our codebase depend on Azure alone.

JAIS0N commented 7 months ago

Same Issue for me as well, this happened all of a sudden

jdeepak-4u commented 6 months ago

Use AzureOpenAIEmbeddings...for more details refer below.

https://python.langchain.com/docs/integrations/text_embedding/azureopenai

alexfilothodoros commented 5 months ago

Hi. Is there any update on this one? The solution proposed by @muzian666 is great, but i was wondering if thus issue is going to be fixed in a future version.

FredrikGordh commented 4 months ago

I got it working with the following setup:

AzureOpenAIEmbeddings( azure_deployment="analytix-embedding", openai_api_version=API_VERSION, azure_endpoint=API_END_POINT, api_key=API_KEY, )

based on this link: https://python.langchain.com/v0.1/docs/integrations/vectorstores/azuresearch/