ropensci / qpdf

Split, Combine and Compress PDF files
https://docs.ropensci.org/qpdf
Other
57 stars 10 forks source link

Error: too many open files #21

Open captcoma opened 1 year ago

captcoma commented 1 year ago

Usually qpdf::pdf_combine worked fine for a few PDFs, but when I try to combine 500 PDFs get the error: Too many open files.

I found that this error is related to the fact that qpdf opens the files during the process: https://qpdf.readthedocs.io/en/stable/cli.html. There is also a solution with --keep-files-open=[y|n]. However, I think this is not implemented in the R package.

Could I modify pdf_combine that it works?

jeroen commented 1 year ago

@jberkenbilt should I somehow be closing the input QPDF after reading it here?

https://github.com/ropensci/qpdf/blob/a9aad799346c4597c6a3a41861c4c99720325cb1/src/bindings.cpp#L80-L95

fwiw we still bundle qpdf 8.4.0 right now (to support centos-7 systems).

jberkenbilt commented 1 year ago

What you'll have to do is to use ClosedFileInputSource and processInputSource. See https://github.com/qpdf/qpdf/blob/989819b75fba380ecdc7416a504ed4b3a2d42ccb/libqpdf/QPDFJob.cc#L2590 as an example, and let me know if you need more guidance. The idea is that ClosedFileInputSource is an input source that opens the file when it needs to use it and closes it afterwards. It causes some overhead, but on a local file system, it's negligible. The overhead is very high over a network file system. ClosedFileInputSource has a stayOpen method you can use as a hint to keep it open if you're going to be doing a lot of operations. The code in QPDFJob that combines pages keeps it open while adding pages, but ultimately it's QPDFWriter that will pull the data out of the original files, and it will open the files multiple times, which shouldn't be an issue. While QPDFJob is later than 8.4.0, all the basic methods called in this example are there in 8.4.0, though you will still need PointerHolder instead of std::shard_ptr. You can probably find this same block of code in qpdf/qpdf.cc in 8.4.0.

jeroen commented 5 months ago

I tried to have a look at this ClosedFileInputSource api but I can't figure it out. I think we'll have to table it anyway until we upgrade the bundled libqpdf.

I wish there was just a simple way to close the files from a QPDF object once we are done with it.

jberkenbilt commented 5 months ago

You could use ClosedFileInputSource for this. You can find several examples in QPDFJob.cc. But, yeah, 8.4.0 is really old.