Feature request: extract plaintext / markdown

rockyzhengwu / rspdf

PDF library in Rust

Apache License 2.0

40 stars 1 forks source link

Hi there! Thanks for creating and sharing this :)

One quite common use case with PDF libraries, is to get the text form a PDF. This is often used for things like indexing documents in a search engine. There is a project in Rust that does this called pdf-extract but I'd love to see an alternative to this (for a couple of reasons)

I noticed rspdf has a way to extract XML text from a PDF. I was wondering whether it would also be possible to extract content as plaintext? Or even better: extract it as markdown!

Perhaps this is completely out of scope for the project. Maybe I could help out with this someday (have some plans in this regard) if you think it may be a good fit.

Cheers!

rockyzhengwu / rspdf

Feature request: extract plaintext / markdown #1