Closed bbrowning closed 1 week ago
Thanks for the approval! After following some discussion elsewhere about being careful when we import anything that imports all of torch
, I'm going to add an additional test to this and defer some of the docling/easyocr imports to not import transformers or torch until they're actually needed. Just a small change, but realized that should go in as part of this because otherwise we're loading all of torch fairly early in our import chain.
Ok, removing the hold now that we're not importing all of Pytorch as soon as someone imports SDG. Instead, we defer that until Docling actually needs torch
loaded by moving some of our imports of docling bits further down into the code. And, the added test ensures we don't accidentally regress on that as we do future docling work here.
Thanks for taking care of this, Ben! 😁
@Mergifyio backport release-v0.5
backport release-v0.5
When setting up our ingestion pipeline, explicitly check if tesserocr is available and Docling can load it. If so, prefer that. Otherwise, attempt the same for EasyOCR. If neither can load, log an error and disable optical character recognition.
Fixes #352