yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.82k stars 271 forks source link

Add integration specs for text extraction with Boxes #420

Closed yob closed 2 years ago

yob commented 2 years ago

The sample file was built like this (and with a mod to the prawn source to specify a CropBox):

    Prawn::Document.generate("hello.pdf") do
      text_box "This text is inside the CropBox", at: [100, 500]
      text_box "There is additional text outside", at: [100, 470]
      text_box "the CropBox and MediaBox", at: [100, 440]
      draw_text "Between CropBox and MediaBox", at: [5, 5]
      draw_text "Outside MediaBox", at: [700, 800]
    end

This also flushed out a bug in Page#rectangles - we weren't returning the CropBox correctly