This is a project to generate meaningful titles for ingested paperless documents using AI. Sends the OCR text of the document to the OpenAI API and generates a title for the document. The title is then saved to the document's metadata.
git clone https://github.com/sjafferali/paperless-titles-from-ai.git
cp -av paperless-titles-from-ai/.env.example paperless-titles-from-ai/.env
# Update .env file with the correct values
Update docker compose file with the correct path to the project directory.
services:
# ...
paperless-webserver:
# ...
volumes:
- /path/to/paperless-titles-from-ai:/usr/src/paperless/scripts
- /path/to/paperless-titles-from-ai/init:/custom-cont-init.d:ro
environment:
# ...
PAPERLESS_POST_CONSUME_SCRIPT: /usr/src/paperless/scripts/app/main.py
The init folder (used to ensure open package is installed) must be owned by root.
To back-fill titles on existing documents, run the helper cli from the project directory:
docker run --rm -v ./app:/app python:3 /app/scripts/backfill.sh [args] [single|all]
Arguments
Option | Required | Default | Description |
---|---|---|---|
--paperlessurl [URL] | Yes | https://paperless.local:8080 | Sets the URL of the paperless API endpoint. |
--paperlesskey [KEY] | Yes | Sets the API key to use when authenticating to paperless. | |
--openaimodel [MODEL] | No | gpt-4-turbo | Sets the OpenAI model used to generate title. Full list of supported models available at models. |
--openaibaseurl [API Endpoint] | No | Sets the OpenAI compatible endpoint to generate the title from. | |
--openaikey [KEY] | Yes | Sets the OpenAI key used to generate title. | |
--dry | No | False | Enables dry run which only prints out the changes that would be made. |
--loglevel [LEVEL] | No | INFO | Loglevel sets the desired loglevel. |
docker run --rm -v ./app:/app python:3 /app/scripts/backfill.sh [args] all [filter_args]
Arguments
Option | Required | Default | Description |
---|---|---|---|
--exclude [ID] | No | Excludes the document ID specified from being updated. This argument may be specified multiple times. | |
--filterstr [FILTERSTRING] | No | Filters the documents to be updated based on the URL filter string. |
docker run --rm -v ./app:/app python:3 /app/scripts/backfill.sh [args] single (document_id)
Although the OpenAI API privacy document states that data sent to the OpenAI API is not used for training, other OpenAI compatible API endpoints are also supported by this post-consume script, which allows you to use a locally hosted LLM to generate titles.