Open anuraag94 opened 6 years ago
pdf_pagesize()
returns a data frame with page size information (one row per page). This can be used to calculate whether a page is in "portrait" or "landscape" mode (but I don't think you can distinguish a clockwise versus counterclockwise 90 degrees rotation through this or a page flipped upside down 180 degrees).pdf_orientation <- function(input) {
df <- pdftools::pdf_pagesize(input)
ifelse(df$height < df$width, "landscape", "portrait")
}
grid::unit()
units for "big points" is "bigpts".pdf_pagesize()
incorrectly flips the height / width for some rotated pages on a subset of pdf files. Unsure if this is a bug in the pdf files or a bug in my system Poppler library (which is probably a few years old) but this bug seems to go away if I first pre-process the pdf file by running it through ghostscript
first with something like the following help function:pdf_gs <- function(input, output = NULL, ..., args = character(0L)) {
input <- normalizePath(input)
if (!length(output))
output <- sub("\\.pdf$", "_output.pdf", input)
output <- normalizePath(output, mustWork = FALSE)
args <- c("-dBATCH",
"-dNOPAUSE",
"-sDEVICE=pdfwrite",
"-sAutoRotatePages=None",
paste0("-sOutputFile=", shQuote(output)),
args,
shQuote(input))
cmd <- tools::find_gs_cmd()
stdout <- system2(cmd, args, stdout = TRUE)
invisible(output)
}
input |> pdf_gs() |> pdf_orientation()
First, I want to just say that this is fantastic package and has been extremely helpful, thank you.
I'm writing a parser to extract data from unstructured pdfs, and sometimes the pages are rotated 90 degrees. I'm aware that the mediabox stores properties like page width and page height, and with a few exceptions, I can back out the page orientation using that.
My question is whether accessing the mediabox is possible using the PDFTools package, or if you know of any other means I can do this within my R program? Any solution will be much appreciated!