ropensci / pdftools

Text Extraction, Rendering and Converting of PDF Documents
https://docs.ropensci.org/pdftools
Other
513 stars 69 forks source link

Badly formatted number #126

Open howardbaek opened 1 year ago

howardbaek commented 1 year ago

I have downloaded these Google Slides to pdf and am converting them to png: https://docs.google.com/presentation/d/1IJ_uFxJud7OdIAr6p8ZOzvYs-SGDqa7g4cUHtUld03I/edit?usp=sharing

Whenever I run pdf_convert(), I'm getting this error message: PDF error (4890998): Badly formatted number. The number in the parenthesis varies depending on the slide number. The error message doesn't stop the function from running through all the slides though. Also, the png files of the slides that get this error message do not look distorted or broken. They look fine to me.

Do you know what may be causing this error message?

cregouby commented 1 year ago

The warning message also exists using poppler::pdfimages natively, and seems purely informative :

$ pdfimages -f 7 -l 7 -png -list ~/Downloads/ITCR\ Reproducibility\ Advanced\ Comics.pdf 
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   7     0 image     336   287  rgb     3   8  jpeg   yes      116  0   112   112 7854B 2.7%
   7     1 smask     336   287  gray    1   8  image  no       116  0   112   112 3384B 3.5%
   7     2 image     618   618  rgb     3   8  jpeg   yes      117  0   206   206 20.0K 1.8%
   7     3 smask     618   618  gray    1   8  image  no       117  0   206   206 3721B 1.0%
Syntax Warning (758939): Badly formatted number
Syntax Warning (759079): Badly formatted number
Syntax Warning (759177): Badly formatted number
Syntax Warning (759257): Badly formatted number
Syntax Warning (759415): Badly formatted number
   7     4 image     618   618  rgb     3   8  jpeg   yes       84  0   206   206 24.6K 2.2%
   7     5 smask     618   618  gray    1   8  image  no        84  0   206   206 8170B 2.1%
Syntax Warning (759544): Badly formatted number
   7     6 image     383   383  rgb     3   8  jpeg   yes      118  0   153   153 2933B 0.7%
   7     7 smask     383   383  gray    1   8  image  no       118  0   153   153 3342B 2.3%
   7     8 image     618   618  rgb     3   8  jpeg   yes       82  0   206   206 24.2K 2.2%
   7     9 smask     618   618  gray    1   8  image  no        82  0   206   206 2145B 0.6%
   7    10 image     618   618  rgb     3   8  jpeg   yes       83  0   328   328 19.9K 1.8%
   7    11 smask     618   618  gray    1   8  image  no        83  0   328   328 4309B 1.1%
   7    12 image     336   287  rgb     3   8  jpeg   yes      116  0   172   172 7854B 2.7%
   7    13 smask     336   287  gray    1   8  image  no       116  0   172   172 3384B 3.5%
Syntax Warning (759871): Badly formatted number
   7    14 image     618   618  rgb     3   8  jpeg   yes      119  0   316   316 18.3K 1.6%
   7    15 smask     618   618  gray    1   8  image  no       119  0   316   316 3878B 1.0%

So maybe you should open the issue in the poppler repository ?