Closed bbrowning closed 1 week ago
This will likely also imply we need to adjust our docling in requirements.txt
to pull in docling[tesserocr]
instead of docling
. The tesserocr
variant pulls in both tesserocr and easyocr, allowing us to swap between each with the single dependency.
Instead of exposing a new configuration knob here, we'll just prefer tesserocr
when it's available and automatically fallback to easyocr
when it isn't. If neither tesserocr
nor easyocr
load, we'll log an error and disable optical character recognition.
Docling defaults to using
easyocr
for optical character recognition, but we have some downstream consumers that will prefer to use Docling'stesserocr
for OCR. We need to expose a way for users to influence which we use, as it requires code changes in our Docling integration to swap the OCR engine used.