okfde / froide

Freedom Of Information Portal
MIT License
366 stars 87 forks source link

Redaction of non-pdfs #181

Closed torfsen closed 5 years ago

torfsen commented 8 years ago

I'm trying to redact a GIF attachement (it's a screenshot that contains personal information) but the attachement does not load. However the page only says "Lade PDF...". The web console shows the following error messages:

Error: Invalid XRef stream header pdf.worker.js:237:5
XRef_readXRef@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:4339:13
XRef_parse@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:3922:23
PDFDocument_setup@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:3078:7
PDFDocument_parse@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:2959:7
ensureHelper@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:2582:22
NetworkPdfManager_ensure/<@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:2597:7
NetworkPdfManager_ensure@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:2576:1
BasePdfManager_ensureDoc@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:2442:14
loadDocument/</<@https://fragdenstaat.de/static/js/libs/pdfviewer/pdf.worker.js:34455:11
 pdf.worker.js:238:1
Warning: Unsupported feature "unknown" pdf.worker.js:224:5
Warning: Unsupported feature "unknown" ba6a91f70382.js:4:73
Warning: Indexing all PDF objects pdf.worker.js:224:5
InvalidPDFException: Invalid PDF structure ba6a91f70382.js:25:31

It seems that froide tries to load the attachment as a PDF, which fails for obvious reasons. Are non-PDF attachements supported? If not then there should probably at least be a check for supported file formats.

stefanw commented 8 years ago

Non-PDF attachments are currently not supported for redaction. However, there's already code that converts DOC-files to PDFs, so they can be redacted and published.

So either converting from images to PDF (slightly awkward) or adding functionality to redact images might be in order. I'm also occasionally looking for a new way of doing things in this repo: https://github.com/stefanw/froide-redact

torfsen commented 8 years ago

I already suspected that only PDFs are currently supported, thanks for the confirmation.

Redacting images is a useful feature, but until it is available the redaction should be disabled for non-PDF files (with an appropriate message).

arnese commented 5 years ago

There is a conversion for images now that also includes OCR.