Open pilartomas opened 2 weeks ago
Note that I will further modify the Unstructed PoC to support polymorphism over extraction outputs. That way the Docling extraction backend will be able to coexist with others (WDU, Unstructured).
@pilartomas As discussed, we will add docling by updating
src/files/entities/helpers.ts
workers/python/python/extraction
When trying out Docling, I noticed that it downloaded several resources on the first run after installation. If that is really the case, please make sure this is done while building the docker image.
yes, we will do that. It is the same we do with other customers.
Use Docling to perform document extraction.
DoclingDocument
in it's raw form, markdown form and chunked form in S3.