Must provide an 'engine' or 'deployment_id' parameter to create a <class 'openai.api_resources.embedding.Embedding'>

sarthak263 commented 1 year ago

System Info

I am getting an error on FAISS.from_documents(). "openai.error.InvalidRequestError: Must provide an 'engine' or 'deployment_id' parameter to create a <class 'openai.api_resources.embedding.Embedding'>" I tried everything and did they change something recently? This code worked fine and now it doesn't not sure what has changed. I used Chroma.from_documents as well and I still get the same error.

openai.api_type = "azure"
openai.api_version = os.environ["OPENAI_API_VERSION"]
openai.api_base = os.environ["OPENAI_API_BASE"]
openai.api_key = os.environ["OPENAI_API_KEY"]

bot.gpt_turbo = Model_LLM(OPENAI_DEPLOYMENT_NAME).model

embeddings = OpenAIEmbeddings(model=OPENAI_EMBEDDING_MODEL_NAME)

fileLoaded = FileLoader("Data/filename.pdf", TokenTextSplitter(chunk_size=1000, chunk_overlap=1))

text = fileLoaded.load_file()
#vectorStore = Chroma.from_documents(text,embedding=embeddings)
vectorStore = FAISS.from_documents(text,embedding=embeddings)

qa = RetrievalEngine(llm=bot.gpt_turbo, retriever=vectorStore.as_retriever(),chain_type="stuff")

#query = "Please give me the list of sampleID"

while True:
    askQuestion  = input("Ask me a question about the file?: ")
    print(qa.initialize_qa_engine().run(askQuestion))

#Class Code

import os
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.chat_models import AzureChatOpenAI
from langchain.document_loaders import Docx2txtLoader, PyPDFLoader, CSVLoader, UnstructuredFileLoader

class Model_LLM:
    def __init__(self, deployment_name):
        self.model = AzureChatOpenAI(deployment_name=deployment_name)

class FileLoader:
    def __init__(self, file, text_splitter):
        self.file = file
        self.text_splitter = text_splitter
        self._ext = os.path.splitext(self.file)[-1].lower()

    def load_file(self):
        if self._ext in ['.docx', '.doc']:
            return self._call_file_loader(Docx2txtLoader)
        elif self._ext == '.pdf':
            return self._call_file_loader(PyPDFLoader)
        elif self._ext in ['.csv']:
            return self._call_file_loader(CSVLoader)
        #elif self._ext in ['.json']:
            #return self._call_file_loader(JSONLoader)
        elif self._ext in ['.txt', '.json']:
            return self._call_file_loader(UnstructuredFileLoader)
        else:
            return []

    def _call_file_loader(self, loader_class):

        loader = loader_class(self.file)
        _text  = loader.load_and_split(text_splitter=self.text_splitter)
        #documents = loader.load()
        #_text = self.text_splitter.split_documents(documents)

        return _text

class RetrievalEngine:
    def __init__(self, llm, retriever, chain_type='stuff', max_tokens=500):
        self.llm = llm
        self.retriever = retriever
        self.chain_type = chain_type
        self.max_tokens = max_tokens

    def initialize_qa_engine(self):
        return RetrievalQA.from_chain_type(llm=self.llm,
                                                           chain_type=self.chain_type,
                                                           retriever=self.retriever,
                                                           return_source_documents=False)

    def initialize_chat_engine(self):
        return ConversationalRetrievalChain.from_llm(self.llm,
                                                     retriever=self.retriever,
                                                     max_tokens_limit=self.max_tokens)

Who can help?

No response

Information

[ ] The official example notebooks/scripts
[ ] My own modified scripts

Related Components

[X] LLMs/Chat Models
[X] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[X] Document Loaders
[X] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

Traceback (most recent call last):
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\main.py", line 13, in <module>
    vectorstore = Chroma.from_documents(documents=text,embedding=OpenAIEmbeddings())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\langchain\vectorstores\chroma.py", line 578, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\langchain\vectorstores\chroma.py", line 542, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\langchain\vectorstores\chroma.py", line 175, in add_texts
    embeddings = self._embedding_function.embed_documents(list(texts))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\langchain\embeddings\openai.py", line 508, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\langchain\embeddings\openai.py", line 358, in _get_len_safe_embeddings
    response = embed_with_retry(
               ^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\langchain\embeddings\openai.py", line 107, in embed_with_retry
    return _embed_with_retry(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\tenacity\__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\tenacity\__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\tenacity\__init__.py", line 314, in iter
    return fut.result()
           ^^^^^^^^^^^^
  File "C:\Users\spate246\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\tenacity\__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\langchain\embeddings\openai.py", line 104, in _embed_with_retry
    response = embeddings.client.create(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\openai\api_resources\embedding.py", line 33, in create
    response = super().create(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 149, in create
    ) = cls.__prepare_create_request(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\spate246\source\repos\Assay-Development\Azure_OpenAI\Lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 83, in __prepare_create_request
    raise error.InvalidRequestError(
openai.error.InvalidRequestError: Must provide an 'engine' or 'deployment_id' parameter to create a <class 'openai.api_resources.embedding.Embedding'>

Expected behavior

I would expect it not have this issue because it has worked before. It would run the while loop and the user can ask the questions.

sarthak263 commented 1 year ago

Any updates on this?

dosubot[bot] commented 1 year ago

Answer generated by a 🤖

Answer

I understand that you're encountering an error stating "Must provide an 'engine' or 'deployment_id' parameter to create a <class 'openai.api_resources.embedding.Embedding'>". This error is occurring when you're trying to create an instance of the 'OpenAIEmbeddings' class in your code.

The error message suggests that the 'engine' or 'deployment_id' parameter is missing when creating an instance of the 'openai.api_resources.embedding.Embedding' class. In your code, you're creating an instance of 'OpenAIEmbeddings' with only the 'model' parameter:

embeddings = OpenAIEmbeddings(model=OPENAI_EMBEDDING_MODEL_NAME)

According to the documentation of the 'OpenAIEmbeddings' class, you should also provide the 'deployment' parameter when creating an instance of 'OpenAIEmbeddings'.

Here's how you can modify your code to include the 'deployment' parameter:

embeddings = OpenAIEmbeddings(model=OPENAI_EMBEDDING_MODEL_NAME, deployment="your-embeddings-deployment-name")

Please replace "your-embeddings-deployment-name" with the actual name of your deployment. This modification should help resolve the error you're facing.

Please note that this is one potential solution based on the information provided. If this doesn't resolve your issue, or if your experience contradicts this solution, I would recommend checking if you're using the latest version of the LangChain framework, as the issue might have been resolved in a more recent update.

If you believe this is a bug that could impact other users and you're comfortable doing so, you're encouraged to make a pull request with the necessary changes. This would greatly benefit the LangChain community.

Sources

libs/langchain/langchain/embeddings/openai.py

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

magallardo commented 1 year ago

Hello,

I am getting the same error when running a search with Milvus Vector Database. Everything was working fine when using LangChain 0.0.183. I have upgraded to the latest version 0.0.249 and I have started getting the error.

I have updated my code to pass the deployment name to the OpenAIEmbeddings call as indicated above and that did not fix the issue.

Thanks, Marcelo

dreysco commented 1 year ago

Hi,

I was having the same issue, but was able to resolve this by providing the engine argument to the OpenAIEmbeddingscall. Maybe give this a try and see if it works for you.

yam1473 commented 1 year ago

I am still facing the same error even I have defined model, deployment in the embeddings vectordb = Chroma.from_documents( documents=texts, embedding=embedding )

InvalidRequestError: Must provide an 'engine' or 'deployment_id' parameter to create

dosubot[bot] commented 9 months ago

Hi, @sarthak263. I'm Dosu, and I'm here to help the LangChain team manage our backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding of the current state of the issue, you encountered an error when using FAISS.from_documents() in your code. In the comments, I suggested that you should provide the 'deployment' parameter when creating an instance of 'OpenAIEmbeddings'. Another user, magallardo, mentioned that they are also encountering the same error after upgrading to the latest version of LangChain. dreysco suggested providing the 'engine' argument to the 'OpenAIEmbeddings' call as a potential solution.

Now, I'd like to ask you if this issue is still relevant to the latest version of the LangChain repository. If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

If you have any further questions or need additional assistance, please don't hesitate to ask.

sarthak263 commented 9 months ago

Hi, the issue got resolved by adding openai_api_type="azure", OpenAIEmbeddings(model=TEXT_ADA_EMBEDDING_NAME,chunk_size=1,openai_api_type="azure"),

langchain-ai / langchain