langflow-ai / langflow

Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
http://www.langflow.org
MIT License
33.7k stars 4.08k forks source link

OpenAi Embedding Component: Error Building Component RedisCache only accepts values that can be pickled. #3561

Open HanumanTFM opened 2 months ago

HanumanTFM commented 2 months ago

Bug Description

I must say langflow is giving me hard time with various errors, crashes and install that was very challenging. running 1.0.16 on docker.

here is my docker-compose.yml

version: "3.8"

services:
  langflow:
    build: .
    ports:
      - "7860:7860"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    environment:
      - LANGFLOW_DATABASE_URL=postgresql://langflow:langflow@postgres:5432/langflow
      - LANGFLOW_REDIS_HOST=redis
      - LANGFLOW_REDIS_PORT=6379
      - LANGFLOW_CACHE_TYPE=redis
      - LANGFLOW_AUTO_LOGIN=false
      - LANGFLOW_SUPERUSER=admin
      - LANGFLOW_SUPERUSER_PASSWORD=123456
      - LANGFLOW_WORKERS=1
      - LANGFLOW_AUTO_MIGRATE=true
    volumes:
      - langflow-data:/app/backend/langflow
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:7860/health"]
      interval: 30s
      timeout: 10s
      retries: 5

  postgres:
    image: postgres:14
    environment:
      POSTGRES_USER: langflow
      POSTGRES_PASSWORD: langflow
      POSTGRES_DB: langflow
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U langflow"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:alpine
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  langflow-data:
  postgres-data:
  redis-data:

Dockerfile:

FROM langflowai/langflow:latest

USER root

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    libpq-dev \
    python3-dev \
    postgresql-client

# Install psycopg2 and redis
RUN pip install psycopg2-binary redis

USER 1000

CMD ["langflow", "run", "--port", "7860"]

Error running openAI embedding component:

2024-08-27 06:21:53 langflow-1  |                              TypeError: RedisCache only accepts                 
2024-08-27 06:21:53 langflow-1  |                              values that can be pickled.

Reproduction

Install langflow, Open any flow that uses openai embedding

Expected behavior

to work....

Who can help?

No response

Operating System

docker

Langflow Version

1.0.16

Python Version

None

Screenshot

image

Flow File

No response

dosubot[bot] commented 2 months ago

@HanumanTFM Hello there! I'm here to assist you with any bugs, questions, or contributions you have. Let's tackle this issue together while we wait for a human maintainer.

I found a similar closed issue that might be relevant to your problem: Can't query SQL db, can't pickle 'module' object. The suggested solution was to update the LangChain version to be <0.1.0 to get the fixes ASAP. However, it was noted that the issue might not be related to LangChain and could involve pickling in the backend [1].

To resolve the "RedisCache only accepts values that can be pickled" error when running the OpenAI embedding component in Langflow 1.0.16 on Docker, ensure that all values being stored in the Redis cache are serializable (i.e., can be pickled). This typically involves ensuring that the objects you are trying to cache are of types that can be serialized by Python's pickle module.

Here's a snippet of the build method from the RedisComponent class:

def build(
    self,
    embedding: Embeddings,
    redis_server_url: str,
    redis_index_name: str,
    schema: Optional[str] = None,
    inputs: Optional[Record] = None,
) -> Union[VectorStore, BaseRetriever]:
    documents = []
    for _input in inputs or []:
        if isinstance(_input, Record):
            documents.append(_input.to_lc_document())
        else:
            documents.append(_input)
    if not documents:
        if schema is None:
            raise ValueError("If no documents are provided, a schema must be provided.")
        redis_vs = Redis.from_existing_index(
            embedding=embedding,
            index_name=redis_index_name,
            schema=schema,
            key_prefix=None,
            redis_url=redis_server_url,
        )
    else:
        redis_vs = Redis.from_documents(
            documents=documents,  # type: ignore
            embedding=embedding,
            redis_url=redis_server_url,
            index_name=redis_index_name,
        )
    return redis_vs

Ensure that the documents are pickle-compatible. You might need to inspect the Record class and the to_lc_document method to confirm that they produce serializable objects. If any custom objects are being used, make sure they implement the necessary methods to be pickled.

You can manually test the serialization of your documents using the pickle module:

import pickle

# Example document
document = {"key": "value"}  # Replace with your actual document

# Test pickling
try:
    pickle.dumps(document)
    print("Document is pickle-compatible")
except pickle.PicklingError:
    print("Document is not pickle-compatible")

Ensure that all objects being passed to Redis can be serialized without errors. If any object fails to be pickled, you will need to modify it to be compatible or avoid caching it [2].

Additionally, the Record class and its to_lc_document method should produce pickle-compatible objects. The Record class is a subclass of BaseModel from Pydantic, which supports serialization, and the to_lc_document method converts a Record to a Document [3][4].

If you continue to face issues, please ensure that your Docker setup and dependencies are correctly configured and up-to-date.

To continue talking to Dosu, mention @dosu.

BCeZn commented 2 months ago

How about do not to use RedisCache?Try to use default cache