ucbepic / docetl

A system for agentic LLM-powered data processing and ETL
https://docetl.org
MIT License
1.28k stars 117 forks source link
agents data data-pipelines elt etl llm python workflow

DocETL: Powering Complex Document Processing Pipelines

Website Documentation Discord Paper

DocETL Figure

DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers a low-code, declarative YAML interface to define LLM-powered operations on complex data.

When to Use DocETL

DocETL is the ideal choice when you're looking to maximize correctness and output quality for complex tasks over a collection of documents or unstructured datasets. You should consider using DocETL if:

Community Projects

Educational Resources

Installation

Prerequisites

Quick Start

  1. Install from PyPI:
    pip install docetl

To see examples of how to use DocETL, check out the tutorial.

Running the UI Locally

We offer a simple UI for building pipelines. We recommend building up complex pipelines one operation at a time, so you can see the results of each operation as you go and iterate on your pipeline. To run it locally, follow these steps:

Playground Screenshot

  1. Clone the repository:

    git clone https://github.com/ucbepic/docetl.git
    cd docetl
  2. Set up environment variables in .env in the root/top-level directory:

    OPENAI_API_KEY=your_api_key_here
    BACKEND_ALLOW_ORIGINS=
    BACKEND_HOST=localhost
    BACKEND_PORT=8000
    BACKEND_RELOAD=True
    FRONTEND_HOST=0.0.0.0
    FRONTEND_PORT=3000

And create an .env.local file in the website directory with the following:

OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini

NEXT_PUBLIC_BACKEND_HOST=localhost
NEXT_PUBLIC_BACKEND_PORT=8000
  1. Install dependencies:
    make install      # Install Python package
    make install-ui   # Install UI dependencies

Note that the openai api key, base, and model name are for the UI assistant only; not the DocETL pipeline execution engine.

  1. Start the development server:

    make run-ui-dev
  2. Visit http://localhost:3000/playground

Development Setup

If you're planning to contribute or modify DocETL, you can verify your setup by running the test suite:

make tests-basic  # Runs basic test suite (costs < $0.01 with OpenAI)

For detailed documentation and tutorials, visit our documentation.