michaelrsweet / pdfio

PDFio is a simple C library for reading and writing PDF files.
https://www.msweet.org/pdfio
Apache License 2.0
187 stars 44 forks source link

Generate streaming output (vs. file output) #21

Closed tillkamppeter closed 3 years ago

tillkamppeter commented 3 years ago

Follow-up from OpenPrinting micro-conference on Linux Plumbers 2021

cups-filters uses QPDF a lot for most of its non-rendering/rasterizing PDF handling tasks. Disadvantage of this is that QPDF is C++ (ugly, harder to understand/maintain/port). Filters (filter functions) using it are pdftopdf(), pclmtoraster(), rastertopdf(), pdftops(), ghostscript(), bannertopdf(). If one could replace QPDF by pdfio here, one could get rid oc C++ altogether in cups-filters.

Unfortunately, pdfio does not support all the functionality needed for cups-filters (QPDF only has it as QPDF author Jay Berkenbilt implemented my feature requests, together with some GSoC students). So after freeing cups-filters from use of undocumented Poppler APIs with the help of QPDF the next step is eliminating C++ with the help of pdfio.

Another missing feature is to generate streaming PCLm output from raster (or perhaps even general PDF) input. PCLm is a raster-only sub-set of PDF and was probably primarily created as a standardized job format for cheaper raster-only printers without the computing power for a full-fledged PDF renderer. PCLm is also displayed by standard PDF viewers and printed by standard PDF printers. As it is a sub-set of PDF we use QPDF for it in cups-filters. Jay Berkenbilt and a GSoC student have added support for it to QPDF.

First, one could think that PCLm is not that important, as practically all PCLm-supporting printers also support Apple Raster (manufacturers want their printers to work with iPhones), but PCLM has also another use. As it is streaming-capable (in contrary to general PDF, so that cheap printers which cannot hold a whole job in memory can print it), so we could use it when we create a Printer Application supporting PDF printers with PAPPL. For raster input we convert the raster into a PCLm stream so the Printer Application itself is streaming and could even run on some low-resource system, and if subsequent filters or the printer allow streaming (for example a PDF printer which also supports PCLm explicitly) we can stream the whole job through. Note that the PCLm output of QPDF is not streaming, data comes only out when the job is completed.

So my feature request is to add support for streaming PCLm output to pdfio, to allow for a streaming, C++-free rastertopdf() filter function (this one also outputs PCLm) in cups-filters and also to allow streaming raster-to-PDF/PClm processing in PAPPL-based Printer Applications.

michaelrsweet commented 3 years ago

So I’m assigning this to the 1.0 milestone so that it is available in the first release. The proposed API is as follows:

typedef ssize_t (*pdfio_output_cb_t)(void *ctx, const void *data, size_t datalen);

extern pdfio_file_t *pdfioFileCreateOutput(pdfio_output_cb_t output_cb, void *output_ctx, const char *version, pdfio_rect_t *media_box, pdfio_rect_t *crop_box, pdfio_error_cb_t error_cb, void *error_data);

Thus you can output to memory, to a file descriptor, to a HTTP connection, etc. The output will be slightly larger due to the use of indirect objects for the stream lengths, but that usually is only about 50 bytes per object total.

tillkamppeter commented 3 years ago

Looks OK for me.

michaelrsweet commented 3 years ago

OK, done! :)