ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
http://ocrmypdf.readthedocs.io/
Mozilla Public License 2.0
14.11k stars 1.02k forks source link

[Bug]: Example docker-compose.yml not working anymore #1415

Closed ckagerer closed 1 week ago

ckagerer commented 1 week ago

What were you trying to do?

Error description

Use OCRmyPDF in batch mode to automatically convert all files in a directory. Setup corresponds to that in https://github.com/ocrmypdf/OCRmyPDF/blob/d303b42c8610d3bec0be19b089008ece821cbe9e/misc/docker-compose.example.yml. Since updating to v16.6.0, however, this no longer works, as the path in which “watcher.py” can be found has apparently changed (unintentionally).

Solution / Workaround

Change ...

# SPDX-FileCopyrightText: 2022 James R. Barlow
# SPDX-License-Identifier: MIT
---
version: "3.3"
services:
  ocrmypdf:
    restart: always
    container_name: ocrmypdf
    image: jbarlow83/ocrmypdf
    volumes:
      - "/media/scan:/input"
      - "/mnt/scan:/output"
    environment:
      - OCR_OUTPUT_DIRECTORY_YEAR_MONTH=0
    user: "<SET TO YOUR USER ID>:<SET TO YOUR GROUP ID>"
    entrypoint: python3
    command: watcher.py

... to

# SPDX-FileCopyrightText: 2022 James R. Barlow
# SPDX-License-Identifier: MIT
---
version: "3.3"
services:
  ocrmypdf:
    restart: always
    container_name: ocrmypdf
    image: jbarlow83/ocrmypdf
    volumes:
      - "/media/scan:/input"
      - "/mnt/scan:/output"
    environment:
      - OCR_OUTPUT_DIRECTORY_YEAR_MONTH=0
    user: "<SET TO YOUR USER ID>:<SET TO YOUR GROUP ID>"
    entrypoint: python3
    command: misc/watcher.py

Where are you installing/running from?

Docker container

OCRmyPDF version

16.6.0

What operating system are you working on?

Linux

Operating system details and version

No response

Simple sanity checks

Relevant log output

ocrmypdf    | python3: can't open file '/app/watcher.py': [Errno 2] No such file or directory
ckagerer commented 1 week ago

I noticed something else. The “.git” folder is now also copied into the Docker container.

grafik

jbarlow83 commented 1 week ago

Fixed in commit c8c53d38

ckagerer commented 1 week ago

@jbarlow83 Thanks for the quick fix.

From my point of view, however, it would be better to exclude the Git folder directly in the .dockerignore (https://github.com/ocrmypdf/OCRmyPDF/blob/116e2692d0fdb18fd74172b209a70758358cc9c0/.dockerignore#L47). This saves space in the container.

jbarlow83 commented 1 week ago

ocrmypdf determines the current version from .git (via hatch-vcs) so it needs .git to be present at installation. After the version is written into the virtual environment, .git can be removed.