robsonsmartins / php-projects

PHP and JavaScript Projects
GNU General Public License v3.0
64 stars 36 forks source link

[FPubD] Searchable PDFs #7

Open marcelopm opened 7 years ago

marcelopm commented 7 years ago

Hi,

I was wondering if there is anyway of adding search support for the generated PDFs.

This one is good example:

https://issuu.com/guani/docs/designpatternscard

If downloaded through issuu.com, the PDF has searching support. But not when it's downloaded by the tool.

Congrats on the project! Cheers

robsonsmartins commented 7 years ago

Hi, Thank you for using my tool!

I appreciate your suggest. In this case mentioned by you, the publication's author submitted an original PDF file with text (or OCR'zed) capability, created, for example, in Adobe Acrobat Pro. And, the author authorized the publication for normal download into issuu.com.

My difficult is to find an open source pure JavaScript OCR library, because PDF generation library used no contains OCR capability. My tool generate a PDF file by source images (JPG), available to any person, from issuu.com, for all publications hosted in this site.

If you know any open source pure JavaScript OCR library (that works over static JPG images), please help me.

For this moment, to convert a 'non searcheable' PDF to a 'searcheable', use the OCR tool by Adobe Acrobat Pro software.

marcelopm commented 7 years ago

I see, thanks for the explanation. In regards to the libs, I just recently came across these ones:

I've created a codepen as a Tesseract demo: http://codepen.io/anon/pen/pNmEMm It currently uses data uri for the image due to cross-domain limitations

Ocrad demo: https://github.com/kdzwinel/JS-OCR-demo

Not sure how realist is to use them, Cheers

marcelocecin commented 7 years ago

@marcelopm e @robsonsmartins monitorei o app para iPhone e ele busca a url https://publication.issuu.com/guani/designpatternscard/ios_1.json com isso fica fácil pegar os PDFs em alta qualidade http://page-pdf.issuu.com/090210092600-dbd6d6b6b79e4db994b7146b6856bd0d/1.pdf http://page-pdf.issuu.com/090210092600-dbd6d6b6b79e4db994b7146b6856bd0d/2.pdf

robsonsmartins commented 7 years ago

Thanks, @marcelocecin This URL/pattern contains searchable PDFs of all pages of a publication. I'm looking for a free/open source JavaScript library to merge many PDF files in one.

marcelopm commented 7 years ago

This might help with the merging: http://stackoverflow.com/questions/9809001/is-there-a-way-to-combine-pdfs-in-pdf-js/40984782#answer-40984782 http://stackoverflow.com/questions/21478738/how-can-we-do-pdf-merging-using-javascript

robsonsmartins commented 7 years ago

Thanks a lot! I need now of free time to work with this...