ropensci / qpdf

Split, Combine and Compress PDF files
https://docs.ropensci.org/qpdf
Other
57 stars 10 forks source link

pdf_split creates file names with excessive length #18

Closed G5W closed 7 months ago

G5W commented 2 years ago

When pdf_split is used on a file with n pages, it numbers the results file using an n digit number. So a 20 page pdf file gets a name with a 20 digit number on the end. I have to split many files with a large number of pages. Any file with more than 260 pages gets a file name with more than 260 characters. On a windows machine, this causes the program to error out and fail.

To have a unique file number for an n-page file should only require ceiling(log10(n))+1 digits not n digits.

I am using qpdf Version: 1.1 under R version 4.1.3.

jeroen commented 2 years ago

Can you send a pr?

G5W commented 2 years ago

I think that you are asking for a pull request, which I understand to mean a proposed correction to the code. However, I am not a C++ programmer and could not find the part of the code that creates the filename. I am sorry, but I cannot help with fixing the code.

BTW, Above I proposed computing the exact size needed for the index numbers in the file name. I realized later that a simpler solution might be to always use a 9 digit number. That would accommodate up to 1 billion pages and so would be very unlikely to cause problems for anyone.

enricoschumann commented 2 years ago

I am not a C++ programmer either, but I'd also appreciate if this could be fixed. The described behaviour stems from this line: https://github.com/ropensci/qpdf/blob/da5d359a464d1abbac3b1ac17345f7d39a406b7d/src/bindings.cpp#L45 int_to_string receives pages.size() as the number of digits of the resulting string, when it should for n pages receive floor(log10(n))+1 (in R-syntax).

(If I had to fix it, I'd #include <math.h> in bindings.cpp and use floor/log from that file. But I perhaps Rcpp offers something better here?)

zeileis commented 7 months ago

Yet another non-C++ programmer but one of my scripts also just choked on the same problem.

The description from Enrico @enricoschumann looks very useful for me but I also don't know enough C++ to turn this into a fix in the code (and ultimately into a PR).

Jeroen @jeroen, is this something you might be able to have a look again in the not so distant future?

jeroen commented 7 months ago

Can you please include example code? It seems to work as intended for me, the filename is postfixed with a number, but not excessive.

> pdf_file
[1] "/var/folders/gz/rwkfg8wx1wqgcr0ccpz42dyc0000gn/T//RtmpZvEBI7/output.pdf"
> pdf_length(pdf_file)
[1] 3
> pdf_split(pdf_file)
[1] "/private/var/folders/gz/rwkfg8wx1wqgcr0ccpz42dyc0000gn/T/RtmpZvEBI7/output_001.pdf"
[2] "/private/var/folders/gz/rwkfg8wx1wqgcr0ccpz42dyc0000gn/T/RtmpZvEBI7/output_002.pdf"
[3] "/private/var/folders/gz/rwkfg8wx1wqgcr0ccpz42dyc0000gn/T/RtmpZvEBI7/output_003.pdf"
jeroen commented 7 months ago

Ah I found the issue. It is fixed now, you can install the new version from https://ropensci.r-universe.dev/qpdf. I'll try to release to CRAN soon.

FWIW, if you had included a reprex code it would have been much easier for me to fix this sooner.

zeileis commented 7 months ago

Fantastic, Jeroen, thanks for resolving this issue - and apologies for not posting a reproducible example.

BTW: I think this also resolves the issue that had prompted PR https://github.com/ropensci/qpdf/pull/6 which can be closed now IMO. Your solution is nicer.

jeroen commented 7 months ago

Submitted qpdf 1.3.3 to cran.

zeileis commented 7 months ago

Fantastic, thanks, already live.

enricoschumann commented 7 months ago

great, many thanks!