jrmuizel / pdf-extract

A rust library for extracting content from pdfs
396 stars 78 forks source link

Add basic support for Do operation #12

Closed yuri91 closed 5 years ago

yuri91 commented 5 years ago

Hi! I wanted to convert a pdf which uses Form XObjects inside (they are a sort of embedded document stored as a resource). They are rendered with the Do operation, which is not currently supported in pdf-extract.

With this PR I added basic support for it, that seems to be enough for my use case (I don't have other pdfs with XObjects to test unfortunately).

The implementation just extract the content and the resources of the embedded document, and recursively calls process_stream on them.

Let me know if this seems ok to you.