Website (Includes Demo) | Documentation | Discord | NotebookLM Podcast (thanks Shabie from our Discord community!) | Paper (coming soon!)
DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers a low-code, declarative YAML interface to define LLM-powered operations on complex data.
DocETL is the ideal choice when you're looking to maximize correctness and output quality for complex tasks over a collection of documents or unstructured datasets. You should consider using DocETL if:
See the documentation for installing from PyPI.
Before installing DocETL, ensure you have Python 3.10 or later installed on your system. You can check your Python version by running:
python --version
git clone https://github.com/shreyashankar/docetl.git
cd docetl
pip install poetry
poetry install
Create a .env file in the project root and add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
Alternatively, you can set the OPENAI_API_KEY environment variable in your shell.
make tests-basic
That's it! You've successfully installed DocETL and are ready to start processing documents.
For more detailed information on usage and configuration, please refer to our documentation.