Hi!
I wanted to convert a pdf which uses Form XObjects inside (they are a sort of embedded document stored as a resource). They are rendered with the Do operation, which is not currently supported in pdf-extract.
With this PR I added basic support for it, that seems to be enough for my use case (I don't have other pdfs with XObjects to test unfortunately).
The implementation just extract the content and the resources of the embedded document, and recursively calls process_stream on them.
Hi! I wanted to convert a pdf which uses Form XObjects inside (they are a sort of embedded document stored as a resource). They are rendered with the
Do
operation, which is not currently supported inpdf-extract
.With this PR I added basic support for it, that seems to be enough for my use case (I don't have other pdfs with XObjects to test unfortunately).
The implementation just extract the content and the resources of the embedded document, and recursively calls
process_stream
on them.Let me know if this seems ok to you.