johnwhitington / camlpdf

OCaml library for reading, writing and modifying PDF files
Other
200 stars 29 forks source link

The library outputs on stderr #56

Open dbuenzli opened 1 year ago

dbuenzli commented 1 year ago

The library outputs messages on stderr when there are errors which is quite annoying for applications: in general it's better to let applications choose what they want to do with their standard outputs.

Could you maybe output these messages through a log function that can be redefined by the application instead. e.g. in the Pdfio.input structure ?

dbuenzli commented 1 year ago

Also having read_debug, error_on_malformed and debug_always_treat_malformed as global mutable is not such a great idea in a concurrent world. These parameters should likely be made part of the Pdfio.input structure which I suspect you always have one under hand when you need to consult these parameters.

johnwhitington commented 1 year ago

See also https://github.com/johnwhitington/cpdf-source/issues/59 in cpdf.

dbuenzli commented 1 year ago

I suggested adding this to Pdfio.input but I suspect you rather want this in Pdf.t. I don't think it should be more complicated than adding a:

log : 'a. ('a, Format.formatter, unit) format -> 'a

field to Pdf.t, go over all these Printf.eprintf and replace them with the pdf.log value at hand and allow to specify this value as an optional argument to those function that return a Pdf.t value. This also allows clients to have per document error channels by printfing to a per document allocated Buffer.t value which is good for concurrency and/or parallelism.

Taking a step back it's a bit unclear from the docs what the general error strategy of the library is. It would be nice to have a few words on this either in the module docs and/or more docs in a .mld file.

If I do something simple like reading a file that is not a PDF I find it hard to indicate to end-users that the file is likely not a PDF file. Instead I get is a mix of reports on stderr and an exception with a result totally obscure to any user not acquainted with the PDF file format.

For example here reading a (plain text) CHANGES.md file:

Unable to parse object:
Because of error Pdf.PDFError("Could not find EOF marker whilst reading file channel at position -1"), will read as malformed.
Attempting to reconstruct the malformed pdf channel...
Unable to parse object:
Read 0 objects

CHANGES.md: Failed to read PDF - initial error was
Pdf.PDFError("Could not find EOF marker whilst reading file channel at position -1")

final error was 
Pdf.PDFError("Malformed /Root entry whilst reading file channel at position 53")

 whilst reading file channel at position 53

Having an error type rather than just a string for communicating errors would make it easier for client of the library to communicate what they find appropriate on different type of errors and devise more user friendly for subset of them. In this particular case knowing that a recovery was attempted and did not succeed I could simply output something like

CHANGES.md: not a PDF file (or damaged beyond repair).

rather than the above giberish (and possibly have a verbose logging mode that outputs all the hardcore details).

johnwhitington commented 1 year ago

I have just committed changes to allow redirection of CamlPDF's stderr output via any function. This is done by the new module Pdfe, and by altering the whole of CamlPDF to use the new logging function.

A proper treatment of this will have to wait to beyond v2.6, but this change at least allows users of the library to prevent output on stderr.