mysociety / alaveteli

Provide a Freedom of Information request system for your jurisdiction
https://alaveteli.org
Other
387 stars 195 forks source link

Replace pdftohtml with pdf.js #833

Open henare opened 11 years ago

henare commented 11 years ago

pdf.js is pretty awesome but I wonder what the downsides of doing this might be? Compatibility? Mobile?

crowbot commented 11 years ago

See also https://github.com/mysociety/alaveteli/wiki/Improved-document-conversion Indexability by search engines was one reason originally for doing the conversion ourselves, but google does index PDFs so I'm not sure this should really be a deciding factor.

frabcus commented 11 years ago

This all started with conversion to text - so the site search could find things in documents.

The "View as HTML" was added later on user demand. Mainly really I think for Word documents, where more people didn't have a good local viewer. It doesn't seem there is a pdf.js a-like for Word :( http://stackoverflow.com/questions/14144069/pdf-js-analog-for-word-documents

Anyway, if everyone's browsers / local viewers were good enough, there'd be no need for the "View as HTML" feature at all.

WilliamWDTK commented 1 month ago

This is linked to a problem where a PDF table was made difficult to read by conversion to HTML (the user's preferred format), and the user expressed that the table should be properly converted.