langflow-ai / langflow

Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
http://www.langflow.org
MIT License
35.54k stars 4.23k forks source link

Azure OpenAI Embeddings not accepted by Astra DB component #4462

Open josefpolach opened 2 weeks ago

josefpolach commented 2 weeks ago

Bug Description

I am trying to ingest data from PDF file to Astra DB using Azure OpenAI embeddings, but no matter if I use LangFlow on-line from datastax domain or via docker-compose the result is the same

docker: image

datastax online: image

Reproduction

  1. open RAG flow from LangFlow template
  2. focus on the ingestion part at the bottom
  3. replace OpenAI Embeddings by Azure OpenAI Embeddings
  4. run the flow
  5. you encounter an error: Error building Component Astra DB: Error initializing AstraDBVectorStore: Unsupported data type

I checked the docker Logs but it's nothing there except:

ValueError: Error initializing
AstraDBVectorStore: Unsupported data type

Expected behavior

The flow builds successfully and the content is sent to Astra DB.

Who can help?

No response

Operating System

Docker Compose

Langflow Version

image: langflowai/langflow:latest

Python Version

None

Screenshot

No response

Flow File

https://astra.datastax.com/langflow/ebf1ade5-1471-4ecb-948c-1055de0f9e9a/flow/be5c72e1-802f-4be8-959c-404f5ccb3055/folder/469dd6bc-8ac4-4bc3-8cea-756c61cedb09

OpenAI Astra Ingestion (1).json

dosubot[bot] commented 2 weeks ago

Hey @josefpolach! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's work together to resolve this issue.

I found a similar issue that was discussed and resolved. The issue was related to Astra DB Vectorize not allowing for Bring-Your-Own-Embedding (BYOE), which was addressed in a pull request that introduced changes to handle custom embeddings [1].

To resolve the issue you're facing with Azure OpenAI embeddings not being accepted by the Astra DB component, ensure that your configuration settings are correct. Specifically, you need to:

  1. Set the embedding_service to "Embedding Model".
  2. Configure the provider to "Azure OpenAI".
  3. Choose an appropriate model name, such as "text-embedding-3-small" or "text-embedding-ada-002".
  4. Provide the necessary API key and authentication parameters.

These settings are crucial for the AstraVectorStoreComponent to correctly handle Azure OpenAI embeddings [2][3]. If the error persists, it might be due to a data type mismatch, which should be logged using the loguru logger for further debugging [4].

To continue talking to Dosu, mention @dosu.

edwinjosechittilappilly commented 5 hours ago

Hi @josefpolach Thank you for the feedback. Can you check if the issue persists in the latest version of langflow version 1.1.1 ?

Happy to help if the issue still occurs.