unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.56k stars 251 forks source link

[BUG] partially corrupt PDF causes blank pages #480

Open samuel opened 2 years ago

samuel commented 2 years ago

I have a partially corrupt PDF that works fine in other places I've tried it (e.g. Preview.app on mac, MyPDF, Acrobat), but when reading with UniPDF the first 3 of 4 pages are blank. Running it through mutool clean repairs it so it seems there's enough information to be able to properly read the document. Unfortunately, because it contains sensitive medical information (PHI) I can't share the specific file, but perhaps something can be gleaned from the errors output from unipdf. I will see if it's possible to get a PDF showing the same issue generated that doesn't have PHI.

[DEBUG]  core.go:336 Pdf version 1.7
[DEBUG]  core.go:587 Warning: Unable to find xref table or stream. Repair attempted: Looking for earliest xref from bottom.
[DEBUG]  core.go:1409 ERROR: Unable to find object signature (��1Ҡ��-�����YI)
[DEBUG]  core.go:1262 ERROR Failed reading xref (unable to detect indirect object signature)
[DEBUG]  core.go:1263 Attempting to repair xrefs (top down)

Also, I don't see a way to connect the ERRORs that are logged to the specific PDF that is being read since there can be multiple concurrent uses of the library. Is there any way to read a PDF in a more "strict" mode that returns an error from the read functions rather than logging and attempting repair? This way it would be possible to inform the user that their PDF is partially corrupt.

github-actions[bot] commented 2 years ago

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

samuel commented 2 years ago

Note we have a commercial license (under Spruce Health Inc) so perhaps there's a more appropriate avenue for getting support let me know.

Thanks.

gunnsth commented 2 years ago

@samuel Please use our service desk https://unidoc.atlassian.net/servicedesk/customer/portal/8 and contact support@unidoc.io if there are any problems with access.