yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.82k stars 271 forks source link

Is it possible to get all the contents from a pdf #115

Open ramza1 opened 11 years ago

ramza1 commented 11 years ago

Using your read me, you have this

reader = PDF::Reader.new("somefile.pdf")

reader.pages.each do |page| puts page.fonts puts page.text puts page.raw_content end

this involves looping through all pages to get the text. Is it possible for me to just get all the text without going through each page

mejibyte commented 11 years ago

I don't think it's possible, but you can just concatenate the content:

reader.pages.map { |page| page.text }.join("\n")

Also, since this is not a bug, please close this issue.