Open dbuenzli opened 1 year ago
Also having read_debug
, error_on_malformed
and debug_always_treat_malformed
as global mutable is not such a great idea in a concurrent world. These parameters should likely be made part of the Pdfio.input structure which I suspect you always have one under hand when you need to consult these parameters.
See also https://github.com/johnwhitington/cpdf-source/issues/59 in cpdf.
I suggested adding this to Pdfio.input
but I suspect you rather want this in Pdf.t
. I don't think it should be more complicated than adding a:
log : 'a. ('a, Format.formatter, unit) format -> 'a
field to Pdf.t
, go over all these Printf.eprintf
and replace them with the pdf.log
value at hand and allow to specify this value as an optional argument to those function that return a Pdf.t
value. This also allows clients to have per document error channels by printf
ing to a per document allocated Buffer.t
value which is good for concurrency and/or parallelism.
Taking a step back it's a bit unclear from the docs what the general error strategy of the library is. It would be nice to have a few words on this either in the module docs and/or more docs in a .mld
file.
If I do something simple like reading a file that is not a PDF I find it hard to indicate to end-users that the file is likely not a PDF file. Instead I get is a mix of reports on stderr and an exception with a result totally obscure to any user not acquainted with the PDF file format.
For example here reading a (plain text) CHANGES.md
file:
Unable to parse object:
Because of error Pdf.PDFError("Could not find EOF marker whilst reading file channel at position -1"), will read as malformed.
Attempting to reconstruct the malformed pdf channel...
Unable to parse object:
Read 0 objects
CHANGES.md: Failed to read PDF - initial error was
Pdf.PDFError("Could not find EOF marker whilst reading file channel at position -1")
final error was
Pdf.PDFError("Malformed /Root entry whilst reading file channel at position 53")
whilst reading file channel at position 53
Having an error type rather than just a string for communicating errors would make it easier for client of the library to communicate what they find appropriate on different type of errors and devise more user friendly for subset of them. In this particular case knowing that a recovery was attempted and did not succeed I could simply output something like
CHANGES.md: not a PDF file (or damaged beyond repair).
rather than the above giberish (and possibly have a verbose logging mode that outputs all the hardcore details).
I have just committed changes to allow redirection of CamlPDF's stderr output via any function. This is done by the new module Pdfe
, and by altering the whole of CamlPDF to use the new logging function.
A proper treatment of this will have to wait to beyond v2.6, but this change at least allows users of the library to prevent output on stderr.
The library outputs messages on stderr when there are errors which is quite annoying for applications: in general it's better to let applications choose what they want to do with their standard outputs.
Could you maybe output these messages through a log function that can be redefined by the application instead. e.g. in the
Pdfio.input
structure ?