smalot / pdfparser

PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
GNU Lesser General Public License v3.0
2.37k stars 538 forks source link

Want to remove page numbers eg: (page 1 of 2), when using getText( ), can I achieve that ? #662

Closed SwanHtet018 closed 8 months ago

SwanHtet018 commented 9 months ago

image

Can I remove page numbers (eg.page 1 of 2) while using getText( ) ?

k00ni commented 9 months ago

That is out of scope of PDFParser, because it only (tries to) extract what is in the PDF. But you could solve this by using a regex together with preg_replace (https://www.php.net/manual/en/function.preg-replace.php) to remove unwanted sub strings from getText() output.

Try something like this: /Page\s+[0-9]+\s+of\s*[0-9]+/i (untested)