Open joepio opened 9 months ago
Thanks, i'm glad your have interest about this project.
This project is primarily centered around extracting text and images and converting them to other formats at now, plain text. Markdown support is a potential future addition.
However, there are numerous bugs to address, particularly related to fonts. Consequently, the timeline for completion is uncertain
Hi there! Thanks for creating and sharing this :)
One quite common use case with PDF libraries, is to get the text form a PDF. This is often used for things like indexing documents in a search engine. There is a project in Rust that does this called
pdf-extract
but I'd love to see an alternative to this (for a couple of reasons)I noticed
rspdf
has a way to extract XML text from a PDF. I was wondering whether it would also be possible to extract content as plaintext? Or even better: extract it as markdown!Perhaps this is completely out of scope for the project. Maybe I could help out with this someday (have some plans in this regard) if you think it may be a good fit.
Cheers!