jrmuizel / pdf-extract

A rust library for extracting content from pdfs
396 stars 78 forks source link

StringOutput and extract_text #6

Closed nebularnoise closed 6 years ago

nebularnoise commented 6 years ago

First of all, thank you for this crate, it solves an encoding problem I had with lopdf's doc.extract_text() function. I failed to use it to easily extract accented characters suh as à or even cyrillic, which your crate captures nicely and easily.

However, I just needed to get the text in a String, and I ended up writing to a temporary text file with PlainTextOutput then reading said file.

As I am very much of a beginner in Rust, this implementation may be naïve, but it works for my needs anyway.

jrmuizel commented 6 years ago

I modified your approach to avoid having to duplicate the output device. It ended up a lot more complicated than I wanted but it does work. I published a new version of the crate that has a extract_text()

nebularnoise commented 6 years ago

Thanks a lot :) Works like a charm!