langflow-ai / langflow

Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
http://www.langflow.org
MIT License
33.57k stars 4.07k forks source link

Azure OpenAI Embeddings not accepted by Astra DB component #4462

Open josefpolach opened 2 hours ago

josefpolach commented 2 hours ago

Bug Description

I am trying to ingest data from PDF file to Astra DB using Azure OpenAI embeddings, but no matter if I use LangFlow on-line from datastax domain or via docker-compose the result is the same

docker: image

datastax online: image

Reproduction

  1. open RAG flow from LangFlow template
  2. focus on the ingestion part at the bottom
  3. replace OpenAI Embeddings by Azure OpenAI Embeddings
  4. run the flow
  5. you encounter an error: Error building Component Astra DB: Error initializing AstraDBVectorStore: Unsupported data type

I checked the docker Logs but it's nothing there except:

ValueError: Error initializing
AstraDBVectorStore: Unsupported data type

Expected behavior

The flow builds successfully and the content is sent to Astra DB.

Who can help?

No response

Operating System

Docker Compose

Langflow Version

image: langflowai/langflow:latest

Python Version

None

Screenshot

No response

Flow File

https://astra.datastax.com/langflow/ebf1ade5-1471-4ecb-948c-1055de0f9e9a/flow/be5c72e1-802f-4be8-959c-404f5ccb3055/folder/469dd6bc-8ac4-4bc3-8cea-756c61cedb09

OpenAI Astra Ingestion (1).json

dosubot[bot] commented 2 hours ago

Hey @josefpolach! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's work together to resolve this issue.

I found a similar issue that was discussed and resolved. The issue was related to Astra DB Vectorize not allowing for Bring-Your-Own-Embedding (BYOE), which was addressed in a pull request that introduced changes to handle custom embeddings [1].

To resolve the issue you're facing with Azure OpenAI embeddings not being accepted by the Astra DB component, ensure that your configuration settings are correct. Specifically, you need to:

  1. Set the embedding_service to "Embedding Model".
  2. Configure the provider to "Azure OpenAI".
  3. Choose an appropriate model name, such as "text-embedding-3-small" or "text-embedding-ada-002".
  4. Provide the necessary API key and authentication parameters.

These settings are crucial for the AstraVectorStoreComponent to correctly handle Azure OpenAI embeddings [2][3]. If the error persists, it might be due to a data type mismatch, which should be logged using the loguru logger for further debugging [4].

To continue talking to Dosu, mention @dosu.