ropensci / qpdf

Split, Combine and Compress PDF files
https://docs.ropensci.org/qpdf
Other
57 stars 10 forks source link

Feature Request: Integration of Pages from Multiple PDFs #4

Open billdenney opened 5 years ago

billdenney commented 5 years ago

I have a relatively common use case where I need to modify .pdf files by combining parts of multiple .pdf files. For example, I may want pages 1-2 from input1.pdf, then page 3 from input2.pdf then page 3 from input1.pdf. According to section 7.8 of the qpdf manual (http://qpdf.sourceforge.net/files/qpdf-manual.pdf), there is a function in qpdf to do this.

I think that this would be a modification of the pdf_combine() function to take a pages argument that would be a list of vectors.

The result is possible with a combination of pdf_split() and pdf_combine(), but the result is a less efficient .pdf output file because of pdf object duplication (and it makes intermediate files than would be unnecessary).

billdenney commented 5 years ago

A solution which includes temporary files is below. As written, it could be a replacement for pdf_combine() as it will default to all pages of all files if pages is not provided. Would you like a PR for this (with or without it as a replacement for pdf_combine())?

pdf_combine_multi <- function(input, pages, output=NULL, password="") {
  stopifnot(is.character(input))
  if (!length(output)) {
    output <- sub("\\.pdf$", "_combined.pdf", input[1])
  }
  output <- normalizePath(output, mustWork = FALSE)
  if (missing(pages)) {
    # If pages is not supplied, combine all pages for all inputs
    combine_input <- input
  } else {
    stopifnot(length(input) == length(pages))
    stopifnot(is.list(pages))
    combine_inputs <- character(length=length(input))
    for (input_idx in seq_along(input)) {
      combine_inputs[[input_idx]] <-
        pdf_subset(
          input=input[[input_idx]],
          pages=pages[[input_idx]],
          output=tempfile(fileext=".pdf")
        )
    }
  }
  ret <- qpdf:::cpp_pdf_combine(infiles=combine_inputs, outfile=output, password=password)
  if (!missing(pages)) {
    # clean up temporary files
    file.remove(combine_inputs)
  }
  ret
}