yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.81k stars 271 forks source link

Would the author be interested in a text_at page function? #517

Open 3ynm opened 1 year ago

3ynm commented 1 year ago

I did this very basic function for getting text at certain coordinates:

def text_at(page, x1, y1, x2, y2)
  text = String.new
  page.runs.each do |r|
    next unless r.x >= x1 && r.y >= y1 && r.endx <= x2 && r.endy <= y2

    text << r.text
  end
  text
end

Probably it would be nice to have it on the library itself.

yob commented 1 year ago

411 (shipped in 2.8.0) made the following possible:

PDF::Reader.open("somefile.pdf") do |pdf|
  puts reader.page(1).text(rect: PDF::Reader::Rectangle.new(0, 0, 100, 100))
end

Does that meet a similar need?

3ynm commented 1 year ago

It does. I couldn't find the available opts on the API docs, is that documented anywhere?