yob / pdf-reader

The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe.
MIT License
1.81k stars 271 forks source link

Match whole tokens only #386

Closed sebbASF closed 2 years ago

sebbASF commented 2 years ago

Must not match e.g. a1a

Should also speed matching as no need to scan entire token if the first char is no a digit

yob commented 2 years ago

interesting!

Is this fixing a parsing issue with a real PDF, or just a result of reviewing the codebase for regexp's without anchors?

sebbASF commented 2 years ago

It was found by looking for missing anchors, but then I did a test which showed at least one of the invalid PDFs matches without the anchor but not with it.

As it happens, that test still completes OK, presumably because of some other error.