metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.03k stars 113 forks source link

Work with other targets than only PDF (eg. html, text, etc) #7

Closed metachris closed 8 years ago

metachris commented 8 years ago

At least think about extracting PDFs from websites etc.

ghost-hacked commented 8 years ago

Could you explain further about what you want to be done?

metachris commented 8 years ago

I was thinking that if you point pdfx to a website instead of a pdf, that it also should try to extraxt links/pdfs On Oct 29, 2015 17:06, "Connor Kendrick" notifications@github.com wrote:

Could you explain further about what you want to be done?

— Reply to this email directly or view it on GitHub https://github.com/metachris/pdfx/issues/7#issuecomment-152227576.