Closed harjedmor closed 5 years ago
@harjedmor is not possible with Apache Tika directly, because is not it's goal. You need to cut the PDF file first, and then use this library to extract the text. To do this, there are a few libraries on composer or tools like pdftoolbox or pdftk
There's a trick but I think that is not accurate or will change in the future without warning.
is it possible to parse some page of pdf file to text?